Re: Is there a paper which merges the Aho-Corasik and Boyer-Moore algorithms?

glen herrmannsfeldt <gah@ugcs.caltech.edu>
18 Sep 2006 22:44:02 -0400

          From comp.compilers

Related articles
Is there a paper which merges the Aho-Corasik and Boyer-Moore algorith cfc@shell01.TheWorld.com (Chris F Clark) (2006-09-16)
Re: Is there a paper which merges the Aho-Corasik and Boyer-Moore algo bonzini@gnu.org (Paolo Bonzini) (2006-09-18)
Re: Is there a paper which merges the Aho-Corasik and Boyer-Moore algo gah@ugcs.caltech.edu (glen herrmannsfeldt) (2006-09-18)
Re: Is there a paper which merges the Aho-Corasik and Boyer-Moore algo henry@spsystems.net (2006-09-21)
Re: Is there a paper which merges the Aho-Corasik and Boyer-Moore algo bonzini@gnu.org (Paolo Bonzini) (2006-09-21)
Re: Is there a paper which merges the Aho-Corasik and Boyer-Moore algo cfc@shell01.TheWorld.com (Chris F Clark) (2006-09-21)
Re: Is there a paper which merges the Aho-Corasik and Boyer-Moore algo loekcleophas@gmail.com (loekcleophas@gmail.com) (2006-09-30)
| List of all articles for this month |

From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Newsgroups: comp.compilers
Date: 18 Sep 2006 22:44:02 -0400
Organization: Compilers Central
References: 06-09-089 06-09-099
Keywords: theory
Posted-Date: 18 Sep 2006 22:44:02 EDT

Paolo Bonzini wrote:


>>Specifically, I've been working on string matching problems recently,
>>and thus the classic Aho-Corasik (AC) and Boyer-Moore (BM) algorithms.
>>In addition, I've known for a "long time" that the trick of working
>>backwards ala BM could be applied to a AC style matcher to obtain the
>>same potential sub-linear performance.


One of my favorite places to look for string algorithms is the
CPM (Combinatorial Pattern Matching) conference proceedings.


As well as I remember it, the important part of Aho-Corasik is
a finite-state machine. FSA should be very efficient when you are
looking for one of a large number of different strings.


You might also look at the documentation for BLAST, a string search
program used for protein and DNA sequences from NCBI.


> I think GNU fgrep is doing something very similar.


As well as I remember, the GNU grep use BM on the longest
fixed string in the query, and then apply the regexp once that
is found. For the large number of grep patterns that are fixed
strings, this is especially useful. I haven't looked at the -F
option, though.


-- glen



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.