Perl regular expressions -> DFA?

"Paolo Bonzini" <bonzini@gnu.org>
20 Dec 2001 10:11:18 -0500

          From comp.compilers

Related articles
Perl regular expressions -> DFA? bonzini@gnu.org (Paolo Bonzini) (2001-12-20)
| List of all articles for this month |
From: "Paolo Bonzini" <bonzini@gnu.org>
Newsgroups: comp.compilers
Date: 20 Dec 2001 10:11:18 -0500
Organization: Mailgate.ORG Server - http://www.Mailgate.ORG
Keywords: lex, DFA, question
Posted-Date: 20 Dec 2001 10:11:18 EST

I am quite sure that, barring back references, Perl regular
expressions can be compiled to a deterministic automaton.


Surely, forward assertions can be compiled to a DFA (in the worst
case, if the outer DFA has m states and the assertion DFA has n, the
resulting DFA will have mn), and backward assertions also can since
they are limited to fixed-length strings (so if the DFA has m states
and the assertion is n characters long, we have mn states in the
resulting DFA). Conditionals are equivalent to assertions:


      (?(condition)yes|no)


is equivalent to


      ((condition yes|(?!condition)no)


The delicate part is how Perl handles greediness. Backtracking
operators come in both stingy and greedy versions. While DFAs come in
stingy and greedy versions (just stop at the first final state or go
on until you cannot follow a transition), I don't think you can
intermix them. This would not be a problem if you are only looking
for a match (a la grep), but maybe it can be solved using lookahead
like Flex does (Flex handles parenthesized groups this way IIRC).


Finally there are no-backtracking groups, that are matched just once and
cannot backtrack once they are matched. For example


      (?>[a-z]*)a


will never match because the no-backtracking group has eaten the last a.
I think this can be handled by modifying how the epsilon-closure is
computed, but again I am not sure.


What do you think?


Paolo Bonzini


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.