Re: C regexp package that can save compiled expressions?

glen herrmannsfeldt <gah@ugcs.caltech.edu>
15 Aug 2004 22:18:02 -0400

          From comp.compilers

Related articles
C regexp package that can save compiled expressions? johnl@iecc.com (John R Levine) (2004-08-13)
Re: C regexp package that can save compiled expressions? gneuner2@comcast.net (George Neuner) (2004-08-15)
Re: C regexp package that can save compiled expressions? gah@ugcs.caltech.edu (glen herrmannsfeldt) (2004-08-15)
Re: C regexp package that can save compiled expressions? cbarron413@adelphia.net (Carl Barron) (2004-08-15)
Re: C regexp package that can save compiled expressions? cdc@maxnet.co.nz (Carl Cerecke) (2004-08-15)
| List of all articles for this month |
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Newsgroups: comp.compilers
Date: 15 Aug 2004 22:18:02 -0400
Organization: Comcast Online
References: 04-08-090
Keywords: lex
Posted-Date: 15 Aug 2004 22:18:02 EDT

John R Levine wrote:


> Does anyone know of a regular expression package that lets you save
> the compiled expression in a file and load it into an application
> later?


I had forgotten that they didn't. I did one once that matched a
series of regexp's, but it use an array of regex_t, or rexex_t
pointers, and compiled them each time it was run.


> A fairly effective anti-spam technique is to look at the reverse DNS
> of incoming connections and to reject connections that come from known
> blocks of dialup and otherwise poorly secured computers. You can
> generally write a regular expression for the rDNS of each pool, but
> there's a lot of pools, thousands of them, so on each incoming
> connection the rDNS needs to be matched against thousands of regular
> expressions.


I once did a really large finite state automaton for matching large
numbers of fixed strings. If you can match just the tail of the rDNS
entry, that would work. I had it close to 1GB, so it ran at memory
speed instead of cache speed with no locality at all, but it did run.


> Since you don't care which pattern matches, only whether it matches or
> not, an obvious optimization would be to combine all of the patterns into
> one big honking RE "pattern1 | pattern2 | ... | pattern N" and match the
> rDNS against that, since the RE match time is proportional to the length
> of the target string, not the complexity of the expression. The compile
> time would be long, but the list of patterns changes rarely so one
> compiled copy of the pattern would be used many times.


So Henry Spencer's version compiled output includes pointers, so it
can't be written out and read in again? I did once compile it from
source (on OS/2 1.0) but I didn't look at it much. My next thought
would be to convert pointers to offsets so it could be written out,
and converted back again when read in. Or maybe rewrite it so that it
uses offsets instead of pointers all the way though.


-- glen
[I could rewrite it, but I'd rather not. -John]



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.