Related articles |
---|
Perl regular expressions -> DFA? bonzini@gnu.org (Paolo Bonzini) (2001-12-20) |
From: | "Paolo Bonzini" <bonzini@gnu.org> |
Newsgroups: | comp.compilers |
Date: | 20 Dec 2001 10:11:18 -0500 |
Organization: | Mailgate.ORG Server - http://www.Mailgate.ORG |
Keywords: | lex, DFA, question |
Posted-Date: | 20 Dec 2001 10:11:18 EST |
I am quite sure that, barring back references, Perl regular
expressions can be compiled to a deterministic automaton.
Surely, forward assertions can be compiled to a DFA (in the worst
case, if the outer DFA has m states and the assertion DFA has n, the
resulting DFA will have mn), and backward assertions also can since
they are limited to fixed-length strings (so if the DFA has m states
and the assertion is n characters long, we have mn states in the
resulting DFA). Conditionals are equivalent to assertions:
(?(condition)yes|no)
is equivalent to
((condition yes|(?!condition)no)
The delicate part is how Perl handles greediness. Backtracking
operators come in both stingy and greedy versions. While DFAs come in
stingy and greedy versions (just stop at the first final state or go
on until you cannot follow a transition), I don't think you can
intermix them. This would not be a problem if you are only looking
for a match (a la grep), but maybe it can be solved using lookahead
like Flex does (Flex handles parenthesized groups this way IIRC).
Finally there are no-backtracking groups, that are matched just once and
cannot backtrack once they are matched. For example
(?>[a-z]*)a
will never match because the no-backtracking group has eaten the last a.
I think this can be handled by modifying how the epsilon-closure is
computed, but again I am not sure.
What do you think?
Paolo Bonzini
Return to the
comp.compilers page.
Search the
comp.compilers archives again.