Re: How to do this odd kind of regex match?

"Simon Cozens" <simon.cozens@computing-services.oxford.ac.uk>
24 Jul 2002 01:48:32 -0400

          From comp.compilers

Related articles
How to do this odd kind of regex match? dot@dotat.at (Tony Finch) (2002-07-15)
Re: How to do this odd kind of regex match? michaelparker@earthlink.net (Michael Parker) (2002-07-21)
Re: How to do this odd kind of regex match? joachim_d@gmx.de (Joachim Durchholz) (2002-07-21)
Re: How to do this odd kind of regex match? Martin.Ward@durham.ac.uk (Martin Ward) (2002-07-21)
Re: How to do this odd kind of regex match? simon.cozens@computing-services.oxford.ac.uk (Simon Cozens) (2002-07-24)
| List of all articles for this month |
From: "Simon Cozens" <simon.cozens@computing-services.oxford.ac.uk>
Newsgroups: comp.compilers
Date: 24 Jul 2002 01:48:32 -0400
Organization: Bethnal Green is PEOPLE!
References: 02-07-078
Keywords: lex
Posted-Date: 24 Jul 2002 01:48:32 EDT

"Martin Ward" <Martin.Ward@durham.ac.uk> writes:
> more benefit from preprocessing the text. Build a hash table
> with the locations of all the 2, 3, 4 (or more) letter sequences.
> Then, to match against a regexp containing the word "porn"
> (say), you look up "porn" in the table and get the list of character
> offsets of locations of that 4 character string in the text.


This is essentially what the Perl RE engine does, by performing a FBM
analysis on the text, and then anchoring parts of a RE by doing an FBM
search for portions of the text. For instance, given /\w{3,5}foo/ and
"xxx abcdefoo", Perl does this:


floating `foo' at 3..5 (checking floating) stclass `ALNUM' minlen 6
Guessing start of match, REx `\w{3,5}foo' against `xxxx abcdefoo'...
Found floating substr `foo' at offset 10...
Starting position does not contradict /^/m...
Does not contradict STCLASS...
Guessed: match at offset 5


and starts hunting at the "b".


But I don't think that's what Tony's question was; from what I
understand it, that was "how do I store (potentially multiple levels
of) bracket-captured text when running multiple regexes in
parallel". I'm also working on a fast RE engine, and bracketed text is
where I'm coming unstuck as well, so if anyone's got a decent answer,
I'd love to hear it...
--
The debate rages on: Is Perl Bactrian or Dromedary?


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.