Related articles |
---|
Regular expressions speedup cleos@nb.sympatico-dot-ca.remove (Cleo Saulnier) (2005-08-05) |
Re: Regular expressions speedup haberg@math.su.se (2005-08-07) |
Re: Regular expressions speedup cleos@nb.sympatico.ca (Cleo Saulnier) (2005-08-07) |
Re: Regular expressions speedup haberg@math.su.se (2005-08-10) |
Re: Regular expressions speedup bonzini@gnu.org (Paolo Bonzini) (2005-08-10) |
Re: Regular expressions speedup dot@dotat.at (Tony Finch) (2005-08-10) |
Re: Regular expressions speedup kszabo@bcml120x.ca.nortel.com (2005-08-10) |
Re: Regular expressions speedup jburgy@gmail.com (2005-08-10) |
Re: Regular expressions speedup torbenm@diku.dk (2005-08-13) |
Re: Regular expressions speedup cleos@nb.sympatico-dot-ca.remove (Cleo Saulnier) (2005-08-13) |
From: | jburgy@gmail.com |
Newsgroups: | comp.compilers |
Date: | 10 Aug 2005 11:55:04 -0400 |
Organization: | http://groups.google.com |
References: | 05-08-02305-08-029 |
Keywords: | lex, DFA |
Posted-Date: | 10 Aug 2005 11:55:04 EDT |
Cleo Saulnier wrote:
> Well, I've looked at DFA and it's great if you don't have
> backreferences. But I think I figured out how to resolve this
> problem. However, now I have a new problem with the backreferences.
>
> (a|b)*(bab)+?
>
> input string: aaababbabbab
>
> Assume that I need the two backreferences.
>
> The 1st backreference will match: b (full match is: aaababbab)
> and the 2nd one will match: bab (at end of string)
>
> I'm currently at a loss how to make sure the above is matched and not
> have it stop prematurely because of the lazy matching at say aaabab
Hi there,
I may not be reading your post correctly but it sounds like you're
talking about submatches, not backreferences. A backreference is when
you use a submatch (or capturing parenthesis) as an element in your
regexp. The canonical example is searching for repeated words, e.g.:
\<([A-Za-z]+) +\1\>
(assuming \< \> represent word boundaries. If that's your problem,
please ignore me. If you're trying to resole the inherent ambiguity
that arises when matching subpatterns, I suggest you read the only
serious treatment I found of it: Ville Laurikari's master thesis
"Efficient submatch addressing for regular expressions"
ville.laurikari.net/regex-submatch.pdf
Ville went on to release an excellent regexp matching library called
TRE
laurikari.net/tre/
Maybe you want to collaborate with him rather than reinvent your own
wheel. Or maybe you're doing it as a learning experience and I wish you
good luck :)
Return to the
comp.compilers page.
Search the
comp.compilers archives again.