|Inverse grep firstname.lastname@example.org (glen herrmannsfeldt) (2011-06-08)|
|Re: Inverse grep email@example.com (2011-06-14)|
|Matching very large patterns, was Re: Inverse grep cfc@shell01.TheWorld.com (Chris F Clark) (2011-06-19)|
|Re: Matching very large patterns, was Re: Inverse grep firstname.lastname@example.org (glen herrmannsfeldt) (2011-06-20)|
|From:||glen herrmannsfeldt <email@example.com>|
|Date:||Mon, 20 Jun 2011 04:52:28 +0000 (UTC)|
|Organization:||A noiseless patient Spider|
|References:||11-06-015 11-06-022 11-06-033|
Chris F Clark <firstname.lastname@example.org> wrote:
> Our moderator asked:
>> How serious is the explosion problem in practice, now that compilers
>> no longer have to fit into 64k? We use lex scanners with a thousand
>> tokens, and spamassassin feeds about 2000 patterns to re2c if you use
>> the compiled patterns feature. None of them seem to strain our storage
>> systems. -John
> For compilers on mainline chips this is generally not a problem.
> However, this is such a useful tool, it gets used in areas where the
> problems are much-much larger (or the device is smaller). The
> Aho-Corasick algorithm is one of the key algorithms in matching a set
> of strings to a large set of incoming data. The two places where I've
> seen it used and where size is a problem is networking and
> computational biology.
The favorite for computational biology for many years was dynamic
programming, though often it is too slow. The dynamic programming
algorithm used for diff originated in the one Needleman-Wunsch used
for protein sequences. (I believe that it is referenced in the diff
More recently, the Burrows-Wheeler transform, commonly used in
compression, has become popular for computational biology.
BLAST is based on a DFA similar to the first part of Aho-Corasick, but
dynamic programming is used instead of the usual second part.
Return to the
Search the comp.compilers archives again.