Re: Regexp optimization?

zalman@netcom.com (Zalman Stern)
6 Jan 1999 04:05:54 -0500

          From comp.compilers

Related articles
Regexp optimization? stefan@waldherr.org (Stefan Waldherr) (1999-01-02)
Re: Regexp optimization? zalman@netcom.com (1999-01-06)
| List of all articles for this month |

From: zalman@netcom.com (Zalman Stern)
Newsgroups: comp.compilers
Date: 6 Jan 1999 04:05:54 -0500
Organization: ICGNetcom
References: 99-01-004
Keywords: lex, comment

Stefan Waldherr (stefan@waldherr.org) wrote:
: I have a problem with a huge file that contains (line per line) regular
: expressions. Thing is, that many of the regexps may be redundant and hence
: matching is pretty slow.


[Our moderator suggests flex.]


Or glue the regexps together with an or (|) and feed it to GNU grep
for that matter. I'm not familiar with the algorithm flex uses, but
GNU grep gives the possibility of Boyer-Moore-Gosper combined with DFA
based matching. It will be fast but memory intensive.


Another possibility is to use an NFA construction technique combined
with a lazy DFA builder. I believe the GNU rx pacakge does
this. (Haven't looked at the details.) There are probably other
optimization techniques since the match can be fuzzy. (I.e. you
probably don't have to stick to the exact specification given by the
regexps.)


-Z-
[Flex builds a fixed DFA and interprets it. I haven't done any
experiments but I believe that if you look at runtimes and disregard
all the time it takes to run lex and compile the generated C code,
it's faster than egrep. -John]


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.