Re: Regular expressions speedup

jburgy@gmail.com
10 Aug 2005 11:55:04 -0400

From comp.compilers

Related articles
Regular expressions speedup cleos@nb.sympatico-dot-ca.remove (Cleo Saulnier) (2005-08-05)
Re: Regular expressions speedup haberg@math.su.se (2005-08-07)
Re: Regular expressions speedup cleos@nb.sympatico.ca (Cleo Saulnier) (2005-08-07)
Re: Regular expressions speedup haberg@math.su.se (2005-08-10)
Re: Regular expressions speedup bonzini@gnu.org (Paolo Bonzini) (2005-08-10)
Re: Regular expressions speedup dot@dotat.at (Tony Finch) (2005-08-10)
Re: Regular expressions speedup kszabo@bcml120x.ca.nortel.com (2005-08-10)
*Re: Regular expressions speedup jburgy@gmail.com* (2005-08-10)**
Re: Regular expressions speedup torbenm@diku.dk (2005-08-13)
Re: Regular expressions speedup cleos@nb.sympatico-dot-ca.remove (Cleo Saulnier) (2005-08-13)

| List of all articles for this month |

From:	jburgy@gmail.com
Newsgroups:	comp.compilers
Date:	10 Aug 2005 11:55:04 -0400
Organization:	http://groups.google.com
References:	05-08-023 05-08-029
Keywords:	lex, DFA
Posted-Date:	10 Aug 2005 11:55:04 EDT

Cleo Saulnier wrote:
> Well, I've looked at DFA and it's great if you don't have
> backreferences. But I think I figured out how to resolve this
> problem. However, now I have a new problem with the backreferences.
>
> (a|b)*(bab)+?
>
> input string: aaababbabbab
>
> Assume that I need the two backreferences.
>
> The 1st backreference will match: b (full match is: aaababbab)
> and the 2nd one will match: bab (at end of string)
>
> I'm currently at a loss how to make sure the above is matched and not
> have it stop prematurely because of the lazy matching at say aaabab

Hi there,

I may not be reading your post correctly but it sounds like you're
talking about submatches, not backreferences. A backreference is when
you use a submatch (or capturing parenthesis) as an element in your
regexp. The canonical example is searching for repeated words, e.g.:

        \<([A-Za-z]+) +\1\>

(assuming \< \> represent word boundaries. If that's your problem,
please ignore me. If you're trying to resole the inherent ambiguity
that arises when matching subpatterns, I suggest you read the only
serious treatment I found of it: Ville Laurikari's master thesis
"Efficient submatch addressing for regular expressions"

        ville.laurikari.net/regex-submatch.pdf

Ville went on to release an excellent regexp matching library called
TRE

      laurikari.net/tre/

Maybe you want to collaborate with him rather than reinvent your own
wheel. Or maybe you're doing it as a learning experience and I wish you
good luck :)

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Regular expressions speedup

jburgy@gmail.com10 Aug 2005 11:55:04 -0400

jburgy@gmail.com
10 Aug 2005 11:55:04 -0400