Re: Regular expression search algorithm

"Joachim Durchholz" <>
21 Jul 2002 01:52:37 -0400

          From comp.compilers

Related articles
Regular expression search algorithm (Harry) (2002-07-15)
Re: Regular expression search algorithm (Joachim Durchholz) (2002-07-21)
Re: Regular expression search algorithm (Ralph Corderoy) (2002-07-21)
Re: Regular expression search algorithm (Peter S Tillier) (2002-07-24)
Re: Regular expression search algorithm (Aharon Robbins) (2002-09-03)
regular expression search algorithm (1993-03-18)
Re: regular expression search algorithm (1993-03-19)
| List of all articles for this month |

From: "Joachim Durchholz" <>
Newsgroups: comp.compilers
Date: 21 Jul 2002 01:52:37 -0400
Organization: Compilers Central
References: 02-07-036
Keywords: lex
Posted-Date: 21 Jul 2002 01:52:37 EDT

Harry wrote:
> 1. I have read the boyer moore algo in the intro to algo by
  > corman, rivest, leiserson.... however they taked about only
  > the simple patterns ...
> How to acomodate the regular expression in the pattern ?

Regex searches work very differently than Boyer-Moore. The effect that
makes the Boyer-Moore algorithm fast is implicitly part of the regex
algorithms though.

Regex searching is usually done by compiling the regex into a
deterministic finite automaton, then feeding the input to the regex.
(See Aho/Hopcroft/Ullman: Compiler Construction for a hands-on
introduction into DFAs. The book is also known as the "Dragon book"
since is sports a dragon and a knight on the front title.)

> Is there any varation of the algorithm for the same ?...

There are optimizations for various common cases (searching for any of a
number of words and such). Under Unix, they have been compiled into the
*grep commands (egrep, ngrep (sp?) etc.).
In practice, you need these only if speed is of utmost importance; a
good regexp library tends to be fast enough for most practical applications.

  > and is there anyplace I can get the code.

Take a look at Henry Spencer's regex library. It's tested, fast, and
flexible. (It's part of the bowels of the run-time libraries that come
with my employer's programming languages, and we never had any real
problems with it.)

The only reasons why one might want to look further might be a different
regexp syntax. That would probably not be a good idea though - there are
already too many variations of the regexp syntax around.

Just my 5c.

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.