Related articles |
---|
Regular expressions speedup cleos@nb.sympatico-dot-ca.remove (Cleo Saulnier) (2005-08-05) |
Re: Regular expressions speedup haberg@math.su.se (2005-08-07) |
Re: Regular expressions speedup cleos@nb.sympatico.ca (Cleo Saulnier) (2005-08-07) |
Re: Regular expressions speedup haberg@math.su.se (2005-08-10) |
Re: Regular expressions speedup bonzini@gnu.org (Paolo Bonzini) (2005-08-10) |
Re: Regular expressions speedup dot@dotat.at (Tony Finch) (2005-08-10) |
Re: Regular expressions speedup kszabo@bcml120x.ca.nortel.com (2005-08-10) |
Re: Regular expressions speedup jburgy@gmail.com (2005-08-10) |
Re: Regular expressions speedup torbenm@diku.dk (2005-08-13) |
[1 later articles] |
From: | Cleo Saulnier <cleos@nb.sympatico.ca> |
Newsgroups: | comp.compilers |
Date: | 7 Aug 2005 16:13:30 -0400 |
Organization: | Aliant Internet |
References: | 05-08-023 |
Keywords: | lex, performance |
Posted-Date: | 07 Aug 2005 16:13:30 EDT |
Cleo Saulnier wrote:
> I wrote my own Regular expressions parser CSRegEx for C++ (all OS) which
> is now on sourceforge as public domain. You can access backreferences
> and it also supports UNICODE. I wrote it for use in my LR(1) parser and
> will release that too. The regular expressions parser converts every
> pattern into a binary format (still used as a string). The matching
> algorithm is non-recursive and backtracking. Are there any tips on how
> to speed up the matching process. I was thinking for RE's that aren't
> anchored to the start, that getting the *FIRST* set of chars (as in LL
> and LR parsers) would perhaps simplify the initial scanning process.
> Are there any other obvious things that can be done on a backtracking
> engine?
>
> http://csregex.sourceforge.net/
> http://sourceforge.net/projects/csregex
> [If you really care about speed, why not turn the NFA into a DFA so
> you don't have to do multiple states and backtracking? -John]
Well, I've looked at DFA and it's great if you don't have
backreferences. But I think I figured out how to resolve this
problem. However, now I have a new problem with the backreferences.
(a|b)*(bab)+?
input string: aaababbabbab
Assume that I need the two backreferences.
The 1st backreference will match: b (full match is: aaababbab)
and the 2nd one will match: bab (at end of string)
I'm currently at a loss how to make sure the above is matched and not
have it stop prematurely because of the lazy matching at say aaabab
Return to the
comp.compilers page.
Search the
comp.compilers archives again.