Fast NFA engine anyone?

"Remo D." <rd3ntat0@hotmail.com>
21 Apr 2006 23:45:16 -0400

          From comp.compilers

Related articles
Fast NFA engine anyone? rd3ntat0@hotmail.com (Remo D.) (2006-04-21)
Re: Fast NFA engine anyone? rsc@swtch.com (Russ Cox) (2006-04-22)
Re: Fast NFA engine anyone? rd3ntat0@hotmail.com (Remo D.) (2006-04-22)
Re: Fast NFA engine anyone? cfc@shell01.TheWorld.com (Chris F Clark) (2006-04-23)
Re: Fast NFA engine anyone? reeuwijk@few.vu.nl (2006-04-23)
Re: Fast NFA engine anyone? Danny.Dube@ift.ulaval.ca (2006-05-11)
Re: Fast NFA engine anyone? rd3ntat0@hotmail.com (Remo D.) (2006-05-12)
[1 later articles]
| List of all articles for this month |

From: "Remo D." <rd3ntat0@hotmail.com>
Newsgroups: comp.compilers
Date: 21 Apr 2006 23:45:16 -0400
Organization: MC-link SpA InterNetNews site
Keywords: lex, question
Posted-Date: 21 Apr 2006 23:45:16 EDT

Hi there!
      I'm writing a program to match a list of regexp against a file to
check which one matches and, in case, extract substring from the
matching text for further processing. All regexp are anchored (i.e. I
only need to find matches at the beginning of the line).


      The first implementation I wrote uses a NFA representation for the
regexps. Now I'm looking for something faster.


      I have the impression that going with a DFA representation is not a
good idea. I feel that the time I will have to spend extracting the
substring from the matching text will wipe out any benefit I will get (I
might be wrong, of course).


      Being all the regexps anchored, there should be a easier way!


      Consider that I would even sacrifice some of the expressiveness of
regexp. I already dropped the operator | without any impact on the final
application.


      I hope you can help me to answer the following questions.


      Is there an optimized implementation of a NFA engine (or a paper
describing how to do it)?


      Is there any work done on comparing NFA and DFA when regexp are
anchored (if it makes a difference as I suspect)?


      There could be a better approach (e.g. SNOBOL-like pattern matching
operators)?


      Any help would be gratefully appreciated!


      Remo D.
[I think you're wrong about DFAs. If your patterns are really all anchored,
the match is the string from the beginning of the line to the point where
the match succeeded. -John]



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.