Re: Regular expressions speedup

jburgy@gmail.com
10 Aug 2005 11:55:04 -0400

          From comp.compilers

Related articles
Regular expressions speedup cleos@nb.sympatico-dot-ca.remove (Cleo Saulnier) (2005-08-05)
Re: Regular expressions speedup haberg@math.su.se (2005-08-07)
Re: Regular expressions speedup cleos@nb.sympatico.ca (Cleo Saulnier) (2005-08-07)
Re: Regular expressions speedup haberg@math.su.se (2005-08-10)
Re: Regular expressions speedup bonzini@gnu.org (Paolo Bonzini) (2005-08-10)
Re: Regular expressions speedup dot@dotat.at (Tony Finch) (2005-08-10)
Re: Regular expressions speedup kszabo@bcml120x.ca.nortel.com (2005-08-10)
Re: Regular expressions speedup jburgy@gmail.com (2005-08-10)
Re: Regular expressions speedup torbenm@diku.dk (2005-08-13)
Re: Regular expressions speedup cleos@nb.sympatico-dot-ca.remove (Cleo Saulnier) (2005-08-13)
| List of all articles for this month |
From: jburgy@gmail.com
Newsgroups: comp.compilers
Date: 10 Aug 2005 11:55:04 -0400
Organization: http://groups.google.com
References: 05-08-02305-08-029
Keywords: lex, DFA
Posted-Date: 10 Aug 2005 11:55:04 EDT

Cleo Saulnier wrote:
> Well, I've looked at DFA and it's great if you don't have
> backreferences. But I think I figured out how to resolve this
> problem. However, now I have a new problem with the backreferences.
>
> (a|b)*(bab)+?
>
> input string: aaababbabbab
>
> Assume that I need the two backreferences.
>
> The 1st backreference will match: b (full match is: aaababbab)
> and the 2nd one will match: bab (at end of string)
>
> I'm currently at a loss how to make sure the above is matched and not
> have it stop prematurely because of the lazy matching at say aaabab


Hi there,


I may not be reading your post correctly but it sounds like you're
talking about submatches, not backreferences. A backreference is when
you use a submatch (or capturing parenthesis) as an element in your
regexp. The canonical example is searching for repeated words, e.g.:


        \<([A-Za-z]+) +\1\>


(assuming \< \> represent word boundaries. If that's your problem,
please ignore me. If you're trying to resole the inherent ambiguity
that arises when matching subpatterns, I suggest you read the only
serious treatment I found of it: Ville Laurikari's master thesis
"Efficient submatch addressing for regular expressions"


        ville.laurikari.net/regex-submatch.pdf


Ville went on to release an excellent regexp matching library called
TRE


      laurikari.net/tre/


Maybe you want to collaborate with him rather than reinvent your own
wheel. Or maybe you're doing it as a learning experience and I wish you
good luck :)


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.