Re: Inverse grep (Anton Ertl)
Wed, 15 Jun 2011 12:28:07 GMT

          From comp.compilers

Related articles
Inverse grep (glen herrmannsfeldt) (2011-06-08)
Re: Inverse grep (Chris F Clark) (2011-06-12)
Re: Inverse grep (Tony Finch) (2011-06-13)
Re: Inverse grep (2011-06-14)
Re: Inverse grep (2011-06-15)
Re: Inverse grep (Paul Wankadia) (2012-05-28)
| List of all articles for this month |

From: (Anton Ertl)
Newsgroups: comp.compilers
Date: Wed, 15 Jun 2011 12:28:07 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
References: 11-06-015 11-06-022
Keywords: lex, performance
Posted-Date: 16 Jun 2011 20:04:01 EDT (Torben Fgidius Mogensen) writes:
>glen herrmannsfeldt <> writes:
>> I suppose this is a strange question, but I was wondering if
>> there was ever something like an inverse grep. That is,
>> match a string against a file full of regular expressions. ...

>[How serious is the explosion problem in practice, now that compilers
>no longer have to fit into 64k? We use lex scanners with a thousand
>tokens, and spamassassin feeds about 2000 patterns to re2c if you use
>the compiled patterns feature. None of them seem to strain our storage
>systems. -John]

Even if it was a problem, the solution has been around a long time
(although I am not aware of an academic paper about it): Generate the
DFA on-demand, only for those transitions that are actually used.
Then the worst-case memory demands are O(n) where n is the length of
the input string (and in the usual case the memory usage grows much

In tree-parsing automata (i.e., not for regexps) I have had cases
where the size of the statically generated automaton was a problem
<>: We
could not generate the automata for the largest tree grammars in
practical time and even had problems compiling some of the code that
we could generate. OTOH, on-demand generation of automaton states
worked even for the largest tree grammars, and memory consumption was
on the order of 2MB IIRC. The numbers will be different for regexps,
but I expect the effect to be the same: on-demand automata will work
in many cases where static automata take too long to generate or are
too big to run. The paper about this work is

- anton
M. Anton Ertl

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.