Re: Regular expressions speedup

Cleo Saulnier <cleos@nb.sympatico.ca>
7 Aug 2005 16:13:30 -0400

          From comp.compilers

Related articles
Regular expressions speedup cleos@nb.sympatico-dot-ca.remove (Cleo Saulnier) (2005-08-05)
Re: Regular expressions speedup haberg@math.su.se (2005-08-07)
Re: Regular expressions speedup cleos@nb.sympatico.ca (Cleo Saulnier) (2005-08-07)
Re: Regular expressions speedup haberg@math.su.se (2005-08-10)
Re: Regular expressions speedup bonzini@gnu.org (Paolo Bonzini) (2005-08-10)
Re: Regular expressions speedup dot@dotat.at (Tony Finch) (2005-08-10)
Re: Regular expressions speedup kszabo@bcml120x.ca.nortel.com (2005-08-10)
Re: Regular expressions speedup jburgy@gmail.com (2005-08-10)
Re: Regular expressions speedup torbenm@diku.dk (2005-08-13)
[1 later articles]
| List of all articles for this month |
From: Cleo Saulnier <cleos@nb.sympatico.ca>
Newsgroups: comp.compilers
Date: 7 Aug 2005 16:13:30 -0400
Organization: Aliant Internet
References: 05-08-023
Keywords: lex, performance
Posted-Date: 07 Aug 2005 16:13:30 EDT

Cleo Saulnier wrote:
> I wrote my own Regular expressions parser CSRegEx for C++ (all OS) which
> is now on sourceforge as public domain. You can access backreferences
> and it also supports UNICODE. I wrote it for use in my LR(1) parser and
> will release that too. The regular expressions parser converts every
> pattern into a binary format (still used as a string). The matching
> algorithm is non-recursive and backtracking. Are there any tips on how
> to speed up the matching process. I was thinking for RE's that aren't
> anchored to the start, that getting the *FIRST* set of chars (as in LL
> and LR parsers) would perhaps simplify the initial scanning process.
> Are there any other obvious things that can be done on a backtracking
> engine?
>
> http://csregex.sourceforge.net/
> http://sourceforge.net/projects/csregex
> [If you really care about speed, why not turn the NFA into a DFA so
> you don't have to do multiple states and backtracking? -John]


Well, I've looked at DFA and it's great if you don't have
backreferences. But I think I figured out how to resolve this
problem. However, now I have a new problem with the backreferences.


(a|b)*(bab)+?


input string: aaababbabbab


Assume that I need the two backreferences.


The 1st backreference will match: b (full match is: aaababbab)
and the 2nd one will match: bab (at end of string)


I'm currently at a loss how to make sure the above is matched and not
have it stop prematurely because of the lazy matching at say aaabab


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.