Re: Is there a paper which merges the Aho-Corasik and Boyer-Moore algorithms?

"Paolo Bonzini" <bonzini@gnu.org>
21 Sep 2006 10:33:32 -0400

          From comp.compilers

Related articles
Is there a paper which merges the Aho-Corasik and Boyer-Moore algorith cfc@shell01.TheWorld.com (Chris F Clark) (2006-09-16)
Re: Is there a paper which merges the Aho-Corasik and Boyer-Moore algo bonzini@gnu.org (Paolo Bonzini) (2006-09-18)
Re: Is there a paper which merges the Aho-Corasik and Boyer-Moore algo gah@ugcs.caltech.edu (glen herrmannsfeldt) (2006-09-18)
Re: Is there a paper which merges the Aho-Corasik and Boyer-Moore algo henry@spsystems.net (2006-09-21)
Re: Is there a paper which merges the Aho-Corasik and Boyer-Moore algo bonzini@gnu.org (Paolo Bonzini) (2006-09-21)
Re: Is there a paper which merges the Aho-Corasik and Boyer-Moore algo cfc@shell01.TheWorld.com (Chris F Clark) (2006-09-21)
Re: Is there a paper which merges the Aho-Corasik and Boyer-Moore algo loekcleophas@gmail.com (loekcleophas@gmail.com) (2006-09-30)
| List of all articles for this month |
From: "Paolo Bonzini" <bonzini@gnu.org>
Newsgroups: comp.compilers
Date: 21 Sep 2006 10:33:32 -0400
Organization: Compilers Central
References: 06-09-08906-09-099 06-09-104
Keywords: lex, theory
Posted-Date: 21 Sep 2006 10:33:32 EDT

> > I think GNU fgrep is doing something very similar.
>
> As well as I remember, the GNU grep use BM on the longest
> fixed string in the query, and then apply the regexp once that
> is found. For the large number of grep patterns that are fixed
> strings, this is especially useful. I haven't looked at the -F
> option, though.


GNU grep has three matchers and uses them in this order:
- kwset.c (used to find the longest fixed string, or alone for -F)
- DFA
- NFA (used to confirm a match, when backreferences are used)


kwset.c implements both Boyer-Moore and Commentz-Walter string
searches. The former is used when there is a single search, the latter
if there are many of them. In practice, when you use "-e" it will use
Commentz-Walter:


    grep -e foo.*x -e bar.*yz


will use Commentz-Walter to look for a line having "foo" or "bar" (the
longest fixed string in the two regex foo.*x and bar.*yz), then confirm
the match with a DFA. Instead,


    fgrep foo
    fgrep -e foo -e bar


will only need a Boyer-Moore or Commentz-Walter search, respectively.


Paolo


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.