|Fast NFA engine anyone? email@example.com (Remo D.) (2006-04-21)|
|Re: Fast NFA engine anyone? firstname.lastname@example.org (Russ Cox) (2006-04-22)|
|Re: Fast NFA engine anyone? email@example.com (Remo D.) (2006-04-22)|
|Re: Fast NFA engine anyone? cfc@shell01.TheWorld.com (Chris F Clark) (2006-04-23)|
|Re: Fast NFA engine anyone? firstname.lastname@example.org (2006-04-23)|
|Re: Fast NFA engine anyone? Danny.Dube@ift.ulaval.ca (2006-05-11)|
|Re: Fast NFA engine anyone? email@example.com (Remo D.) (2006-05-12)|
|Date:||11 May 2006 01:53:22 -0400|
"Remo D." <firstname.lastname@example.org> writes:
> Well, consider that I'm interested in subpatterns and back references as
> well (sorry, I didn't mentioned that in my post).
> For example things like: ([_^]*)[A-G]([',]*) where I want to get the,
> possibly empty, sequences I matched at the beginning and at the end. Or
> things like: (["'])[A-Z]\1 where I want uppercase strings enclosed in
> single or double quotes.
Maybe the technique described in:
Dubé D., Feeley M. (2000), "Efficiently building a parse tree from
a regular expression", Acta Informatica, volume 37, no. 2, pages
will interest you. However, it takes care of subpatterns only, not of
In this technique, the idea of extracting subpatterns is taken to the
extreme. Essentially, all sub-expressions of a regular expression are
considered tagged for subpattern extraction. The subpatterns are not
extracted as substrings (nor coordinates inside) of the matching
lexeme but as unconventional parse trees. A parse tree describes how
a word is generated by a regular expression, much the same way as a
conventional parse tree describes how a word is generated by a
The shape of a parse tree for a word `w' matching a regular expression
`e' is mostly dictated by `e'. For instance:
- if `e = r1|r2|...|rN', then the tree looks like `#I:t', where `I'
indicates a choice among the `N' alternatives and `t', the way `w'
is generated by `rI';
- if `e = r1r2...rN', then the tree is a list of trees, `[t1,...,tN]',
where each `tI' indicates how `wI' is generated by `r', such that
`w1...wN = w';
- if `e = r*', then the tree is a list of trees, `[t1,...,tN]' where
each `tI' indicates how `wI' is generated by `r', such that `w1...wN
Each kind of regular expression leads to specific shapes of parse
trees. The technique works with both NFA and DFA and exhibits linear
complexity in the length of lexemes (`e' affects only the hidden
constant). It is only theoretical work for now but we are working on
the integration into a lexical analyzer generator.
Return to the
Search the comp.compilers archives again.