Re: Need regexp source

Joe English <jenglish@flightlab.com>
9 Jan 2000 22:51:33 -0500

From comp.compilers

Related articles
Need regexp source balachandr@yahoo.com (2000-01-02)
Re: Need regexp source arnold@mathcs.emory.edu (2000-01-06)
Re: Need regexp source jos@and.nl (Jos A. Horsmeier) (2000-01-06)
Re: Need regexp source george@castro.dbnet.ece.ntua.gr (2000-01-09)
Re: Need regexp source roumazeilles.NO.SPAM@NO.SPAM.magic.fr (Yves Roumazeilles) (2000-01-09)
Re: Need regexp source mottl@miss.wu-wien.ac.at (Markus Mottl) (2000-01-09)
*Re: Need regexp source jenglish@flightlab.com (Joe English)* (2000-01-09)**
Re: Need regexp source ralph@inputplus.demon.co.uk (2000-01-15)
Re: Need regexp source thp@roam-thp2.cs.ucr.edu (Tom Payne) (2000-01-15)
Re: Need regexp source world!cfc@uunet.uu.net (Chris F Clark) (2000-01-19)

| List of all articles for this month |

From:	Joe English <jenglish@flightlab.com>
Newsgroups:	comp.compilers
Date:	9 Jan 2000 22:51:33 -0500
Organization:	Society for the Abolition of Assignment Statements
References:	00-01-006
Keywords:	DFA, lex

<balachandr@yahoo.com> wrote:
>I am looking for a C source code for regexp which uses a iterative
>routine for pattern matching instead of a recursive routine. [...]

and the moderator added:

>[There's always lex, I suppose. Is there a reasonable way to do interative
>regex matching without all of the work of building a DFA first? -John]

Why yes! As a matter of fact, a truly nifty regular expression
matching algorithm can be found on the comp.compilers file archive:

<URL: ftp://iecc.com/pub/file/regex.tar.gz >

See in particular Section 3 ("The Algorithm") of the file dfa.doc
("Converting Regular Expressions to Finite Automata"). Although the
article is -- as the title implies -- principally concerned with
constructing finite automata, the algorithm can easily be adapted --
simplified in fact -- to work as a regex matcher.

The basic idea is to think of a regular expression as the label of a
state in an (infinite) DFA. It's easy to determine if the RE
represents an accepting state based on its syntactic structure, and
the transition function 'delta :: (RE,Symbol) -> RE' is likewise easy
to compute. Clever use of memoization leads to a very efficient
matching algorithm (O(N) amortized time, where 'N' is the length of
the input). No backtracking or preprocessing required, other than the
time it takes to parse the RE.

--Joe English

jenglish@flightlab.com

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Need regexp source

Joe English <jenglish@flightlab.com>9 Jan 2000 22:51:33 -0500

Joe English <jenglish@flightlab.com>
9 Jan 2000 22:51:33 -0500