Related articles |
---|
How to do this odd kind of regex match? dot@dotat.at (Tony Finch) (2002-07-15) |
Re: How to do this odd kind of regex match? michaelparker@earthlink.net (Michael Parker) (2002-07-21) |
Re: How to do this odd kind of regex match? joachim_d@gmx.de (Joachim Durchholz) (2002-07-21) |
Re: How to do this odd kind of regex match? Martin.Ward@durham.ac.uk (Martin Ward) (2002-07-21) |
Re: How to do this odd kind of regex match? simon.cozens@computing-services.oxford.ac.uk (Simon Cozens) (2002-07-24) |
From: | "Martin Ward" <Martin.Ward@durham.ac.uk> |
Newsgroups: | comp.compilers |
Date: | 21 Jul 2002 02:08:14 -0400 |
Organization: | Compilers Central |
Keywords: | lex |
Posted-Date: | 21 Jul 2002 02:08:14 EDT |
"Tony Finch" <dot@dotat.at> writes:
> I'd also like to be able to match several regexes against the same
> text in parallel,
...
> (The aim is to speed up heuristic spam detection such as SpamAssassin.)
If you are matching a text against a huge number of regexps,
most of which contain words or phrases, then you might get
more benefit from preprocessing the text. Build a hash table
with the locations of all the 2, 3, 4 (or more) letter sequences.
Then, to match against a regexp containing the word "porn"
(say), you look up "porn" in the table and get the list of character
offsets of locations of that 4 character string in the text.
Martin
Martin.Ward@durham.ac.uk http://www.cse.dmu.ac.uk/~mward/ Erdos number: 4
Return to the
comp.compilers page.
Search the
comp.compilers archives again.