Re: Regular expressions speedup

Cleo Saulnier <>
7 Aug 2005 16:13:30 -0400

          From comp.compilers

Related articles
Regular expressions speedup cleos@nb.sympatico-dot-ca.remove (Cleo Saulnier) (2005-08-05)
Re: Regular expressions speedup (2005-08-07)
Re: Regular expressions speedup (Cleo Saulnier) (2005-08-07)
Re: Regular expressions speedup (2005-08-10)
Re: Regular expressions speedup (Paolo Bonzini) (2005-08-10)
Re: Regular expressions speedup (Tony Finch) (2005-08-10)
Re: Regular expressions speedup (2005-08-10)
Re: Regular expressions speedup (2005-08-10)
Re: Regular expressions speedup (2005-08-13)
[1 later articles]
| List of all articles for this month |

From: Cleo Saulnier <>
Newsgroups: comp.compilers
Date: 7 Aug 2005 16:13:30 -0400
Organization: Aliant Internet
References: 05-08-023
Keywords: lex, performance
Posted-Date: 07 Aug 2005 16:13:30 EDT

Cleo Saulnier wrote:
> I wrote my own Regular expressions parser CSRegEx for C++ (all OS) which
> is now on sourceforge as public domain. You can access backreferences
> and it also supports UNICODE. I wrote it for use in my LR(1) parser and
> will release that too. The regular expressions parser converts every
> pattern into a binary format (still used as a string). The matching
> algorithm is non-recursive and backtracking. Are there any tips on how
> to speed up the matching process. I was thinking for RE's that aren't
> anchored to the start, that getting the *FIRST* set of chars (as in LL
> and LR parsers) would perhaps simplify the initial scanning process.
> Are there any other obvious things that can be done on a backtracking
> engine?
> [If you really care about speed, why not turn the NFA into a DFA so
> you don't have to do multiple states and backtracking? -John]

Well, I've looked at DFA and it's great if you don't have
backreferences. But I think I figured out how to resolve this
problem. However, now I have a new problem with the backreferences.


input string: aaababbabbab

Assume that I need the two backreferences.

The 1st backreference will match: b (full match is: aaababbab)
and the 2nd one will match: bab (at end of string)

I'm currently at a loss how to make sure the above is matched and not
have it stop prematurely because of the lazy matching at say aaabab

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.