Related articles |
---|
Need regexp source balachandr@yahoo.com (2000-01-02) |
Re: Need regexp source arnold@mathcs.emory.edu (2000-01-06) |
Re: Need regexp source jos@and.nl (Jos A. Horsmeier) (2000-01-06) |
Re: Need regexp source george@castro.dbnet.ece.ntua.gr (2000-01-09) |
Re: Need regexp source roumazeilles.NO.SPAM@NO.SPAM.magic.fr (Yves Roumazeilles) (2000-01-09) |
Re: Need regexp source mottl@miss.wu-wien.ac.at (Markus Mottl) (2000-01-09) |
Re: Need regexp source jenglish@flightlab.com (Joe English) (2000-01-09) |
Re: Need regexp source ralph@inputplus.demon.co.uk (2000-01-15) |
Re: Need regexp source thp@roam-thp2.cs.ucr.edu (Tom Payne) (2000-01-15) |
Re: Need regexp source world!cfc@uunet.uu.net (Chris F Clark) (2000-01-19) |
From: | Joe English <jenglish@flightlab.com> |
Newsgroups: | comp.compilers |
Date: | 9 Jan 2000 22:51:33 -0500 |
Organization: | Society for the Abolition of Assignment Statements |
References: | 00-01-006 |
Keywords: | DFA, lex |
<balachandr@yahoo.com> wrote:
>I am looking for a C source code for regexp which uses a iterative
>routine for pattern matching instead of a recursive routine. [...]
and the moderator added:
>[There's always lex, I suppose. Is there a reasonable way to do interative
>regex matching without all of the work of building a DFA first? -John]
Why yes! As a matter of fact, a truly nifty regular expression
matching algorithm can be found on the comp.compilers file archive:
<URL: ftp://iecc.com/pub/file/regex.tar.gz >
See in particular Section 3 ("The Algorithm") of the file dfa.doc
("Converting Regular Expressions to Finite Automata"). Although the
article is -- as the title implies -- principally concerned with
constructing finite automata, the algorithm can easily be adapted --
simplified in fact -- to work as a regex matcher.
The basic idea is to think of a regular expression as the label of a
state in an (infinite) DFA. It's easy to determine if the RE
represents an accepting state based on its syntactic structure, and
the transition function 'delta :: (RE,Symbol) -> RE' is likewise easy
to compute. Clever use of memoization leads to a very efficient
matching algorithm (O(N) amortized time, where 'N' is the length of
the input). No backtracking or preprocessing required, other than the
time it takes to parse the RE.
--Joe English
jenglish@flightlab.com
Return to the
comp.compilers page.
Search the
comp.compilers archives again.