Re: Is there a paper which merges the Aho-Corasik and Boyer-Moore algorithms?

henry@spsystems.net (Henry Spencer)
21 Sep 2006 10:32:09 -0400

          From comp.compilers

Related articles
Is there a paper which merges the Aho-Corasik and Boyer-Moore algorith cfc@shell01.TheWorld.com (Chris F Clark) (2006-09-16)
Re: Is there a paper which merges the Aho-Corasik and Boyer-Moore algo bonzini@gnu.org (Paolo Bonzini) (2006-09-18)
Re: Is there a paper which merges the Aho-Corasik and Boyer-Moore algo gah@ugcs.caltech.edu (glen herrmannsfeldt) (2006-09-18)
Re: Is there a paper which merges the Aho-Corasik and Boyer-Moore algo henry@spsystems.net (2006-09-21)
Re: Is there a paper which merges the Aho-Corasik and Boyer-Moore algo bonzini@gnu.org (Paolo Bonzini) (2006-09-21)
Re: Is there a paper which merges the Aho-Corasik and Boyer-Moore algo cfc@shell01.TheWorld.com (Chris F Clark) (2006-09-21)
Re: Is there a paper which merges the Aho-Corasik and Boyer-Moore algo loekcleophas@gmail.com (loekcleophas@gmail.com) (2006-09-30)
| List of all articles for this month |
From: henry@spsystems.net (Henry Spencer)
Newsgroups: comp.compilers,comp.theory
Date: 21 Sep 2006 10:32:09 -0400
Organization: SP Systems, Toronto, Canada
References: 06-09-089
Keywords: theory
Posted-Date: 21 Sep 2006 10:32:09 EDT



Chris F Clark <cfc@shell01.TheWorld.com> wrote:
>...thus the classic Aho-Corasik (AC) and Boyer-Moore (BM) algorithms.
>In addition, I've known for a "long time" that the trick of working
>backwards ala BM could be applied to a AC style matcher..
>Still, it seems to me to be an obvious thing to do, so I would be
>surprised if someone else hasn't already solved this problem...


It's the Commentz-Walter algorithm, published (although rather obscurely)
in 1979. CW is to AC as BM is to Knuth-Morris-Pratt, roughly speaking, a
backwards-first optimization. Or alternatively :-), CW is to BM as AC is
to KMP, a multi-string generalization.


Commentz-Walter, B., "A string matching algorithm fast on the average",
in Maurer ed., Proc 6th Internat. Coll. on Automata, Languages, and
Programming, Springer-Verlag 1979, pp118-132. The inaccessibility of this
paper (and the internal IBM Germany tech report it was based on) are a
large part of why CW is little known.


I'm tempted to say that a more accessible description is findable in
Crochemore&Rytter, "Text Algorithms", Oxford 1994, but I won't :-),
because that book is a nightmare to read -- in particular, it is riddled
with really nasty typos that make it very difficult to follow the more
subtle algorithms unless you already know what's going on.


Bruce Watson's 1995 Eindhoven PhD thesis, "Taxonomies and Toolkits of
Regular Language Algorithms", is a much more readable and far more
definitive reference -- including extensions of BM to full regular
expressions -- but as far as I know, it suffers from the same problem of
relative inaccessibility.


There is reportedly a good description of CW in Aho's chapter in
van Leeuwen ed., "Handbook of Theoretical Computer Science", vol. A,
but I have not checked this.
--
spsystems.net is temporarily off the air; | Henry Spencer
mail to henry at zoo.utoronto.ca instead. | henry@spsystems.net


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.