Re: Need regexp source

Joe English <jenglish@flightlab.com>
9 Jan 2000 22:51:33 -0500

          From comp.compilers

Related articles
Need regexp source balachandr@yahoo.com (2000-01-02)
Re: Need regexp source arnold@mathcs.emory.edu (2000-01-06)
Re: Need regexp source jos@and.nl (Jos A. Horsmeier) (2000-01-06)
Re: Need regexp source george@castro.dbnet.ece.ntua.gr (2000-01-09)
Re: Need regexp source roumazeilles.NO.SPAM@NO.SPAM.magic.fr (Yves Roumazeilles) (2000-01-09)
Re: Need regexp source mottl@miss.wu-wien.ac.at (Markus Mottl) (2000-01-09)
Re: Need regexp source jenglish@flightlab.com (Joe English) (2000-01-09)
Re: Need regexp source ralph@inputplus.demon.co.uk (2000-01-15)
Re: Need regexp source thp@roam-thp2.cs.ucr.edu (Tom Payne) (2000-01-15)
Re: Need regexp source world!cfc@uunet.uu.net (Chris F Clark) (2000-01-19)
| List of all articles for this month |

From: Joe English <jenglish@flightlab.com>
Newsgroups: comp.compilers
Date: 9 Jan 2000 22:51:33 -0500
Organization: Society for the Abolition of Assignment Statements
References: 00-01-006
Keywords: DFA, lex

<balachandr@yahoo.com> wrote:
>I am looking for a C source code for regexp which uses a iterative
>routine for pattern matching instead of a recursive routine. [...]


and the moderator added:


>[There's always lex, I suppose. Is there a reasonable way to do interative
>regex matching without all of the work of building a DFA first? -John]


Why yes! As a matter of fact, a truly nifty regular expression
matching algorithm can be found on the comp.compilers file archive:


        <URL: ftp://iecc.com/pub/file/regex.tar.gz >


See in particular Section 3 ("The Algorithm") of the file dfa.doc
("Converting Regular Expressions to Finite Automata"). Although the
article is -- as the title implies -- principally concerned with
constructing finite automata, the algorithm can easily be adapted --
simplified in fact -- to work as a regex matcher.


The basic idea is to think of a regular expression as the label of a
state in an (infinite) DFA. It's easy to determine if the RE
represents an accepting state based on its syntactic structure, and
the transition function 'delta :: (RE,Symbol) -> RE' is likewise easy
to compute. Clever use of memoization leads to a very efficient
matching algorithm (O(N) amortized time, where 'N' is the length of
the input). No backtracking or preprocessing required, other than the
time it takes to parse the RE.


--Joe English


    jenglish@flightlab.com


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.