Related articles |
---|
Regular expression search algorithm harvinder.singh@patni.com (Harry) (2002-07-15) |
Re: Regular expression search algorithm joachim_d@gmx.de (Joachim Durchholz) (2002-07-21) |
Re: Regular expression search algorithm ralph@inputplus.co.uk (Ralph Corderoy) (2002-07-21) |
Re: Regular expression search algorithm peter.tillier@btinternet.com (Peter S Tillier) (2002-07-24) |
Re: Regular expression search algorithm arnold@skeeve.com (Aharon Robbins) (2002-09-03) |
regular expression search algorithm forsyth@minster.york.ac.uk (1993-03-18) |
Re: regular expression search algorithm pardo@cs.washington.edu (1993-03-19) |
From: | "Joachim Durchholz" <joachim_d@gmx.de> |
Newsgroups: | comp.compilers |
Date: | 21 Jul 2002 01:52:37 -0400 |
Organization: | Compilers Central |
References: | 02-07-036 |
Keywords: | lex |
Posted-Date: | 21 Jul 2002 01:52:37 EDT |
Harry wrote:
>
> 1. I have read the boyer moore algo in the intro to algo by
> corman, rivest, leiserson.... however they taked about only
> the simple patterns ...
> How to acomodate the regular expression in the pattern ?
Regex searches work very differently than Boyer-Moore. The effect that
makes the Boyer-Moore algorithm fast is implicitly part of the regex
algorithms though.
Regex searching is usually done by compiling the regex into a
deterministic finite automaton, then feeding the input to the regex.
(See Aho/Hopcroft/Ullman: Compiler Construction for a hands-on
introduction into DFAs. The book is also known as the "Dragon book"
since is sports a dragon and a knight on the front title.)
> Is there any varation of the algorithm for the same ?...
There are optimizations for various common cases (searching for any of a
number of words and such). Under Unix, they have been compiled into the
*grep commands (egrep, ngrep (sp?) etc.).
In practice, you need these only if speed is of utmost importance; a
good regexp library tends to be fast enough for most practical applications.
> and is there anyplace I can get the code.
Take a look at Henry Spencer's regex library. It's tested, fast, and
flexible. (It's part of the bowels of the run-time libraries that come
with my employer's programming languages, and we never had any real
problems with it.)
The only reasons why one might want to look further might be a different
regexp syntax. That would probably not be a good idea though - there are
already too many variations of the regexp syntax around.
Just my 5c.
Joachim
Return to the
comp.compilers page.
Search the
comp.compilers archives again.