Re: Regular expression search algorithm

"Joachim Durchholz" <joachim_d@gmx.de>
21 Jul 2002 01:52:37 -0400

          From comp.compilers

Related articles
Regular expression search algorithm harvinder.singh@patni.com (Harry) (2002-07-15)
Re: Regular expression search algorithm joachim_d@gmx.de (Joachim Durchholz) (2002-07-21)
Re: Regular expression search algorithm ralph@inputplus.co.uk (Ralph Corderoy) (2002-07-21)
Re: Regular expression search algorithm peter.tillier@btinternet.com (Peter S Tillier) (2002-07-24)
Re: Regular expression search algorithm arnold@skeeve.com (Aharon Robbins) (2002-09-03)
regular expression search algorithm forsyth@minster.york.ac.uk (1993-03-18)
Re: regular expression search algorithm pardo@cs.washington.edu (1993-03-19)
| List of all articles for this month |
From: "Joachim Durchholz" <joachim_d@gmx.de>
Newsgroups: comp.compilers
Date: 21 Jul 2002 01:52:37 -0400
Organization: Compilers Central
References: 02-07-036
Keywords: lex
Posted-Date: 21 Jul 2002 01:52:37 EDT

Harry wrote:
>
> 1. I have read the boyer moore algo in the intro to algo by
  > corman, rivest, leiserson.... however they taked about only
  > the simple patterns ...
> How to acomodate the regular expression in the pattern ?


Regex searches work very differently than Boyer-Moore. The effect that
makes the Boyer-Moore algorithm fast is implicitly part of the regex
algorithms though.


Regex searching is usually done by compiling the regex into a
deterministic finite automaton, then feeding the input to the regex.
(See Aho/Hopcroft/Ullman: Compiler Construction for a hands-on
introduction into DFAs. The book is also known as the "Dragon book"
since is sports a dragon and a knight on the front title.)


> Is there any varation of the algorithm for the same ?...


There are optimizations for various common cases (searching for any of a
number of words and such). Under Unix, they have been compiled into the
*grep commands (egrep, ngrep (sp?) etc.).
In practice, you need these only if speed is of utmost importance; a
good regexp library tends to be fast enough for most practical applications.


  > and is there anyplace I can get the code.


Take a look at Henry Spencer's regex library. It's tested, fast, and
flexible. (It's part of the bowels of the run-time libraries that come
with my employer's programming languages, and we never had any real
problems with it.)


The only reasons why one might want to look further might be a different
regexp syntax. That would probably not be a good idea though - there are
already too many variations of the regexp syntax around.


Just my 5c.
Joachim


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.