Re: A fast way to match regexp's and extract parts of them

megatest!djones@decwrl.dec.com (Dave Jones)
7 Oct 91 21:59:43 GMT

From comp.compilers

Related articles
A fast way to match regexp's and extract parts of them roberto@cernvax.cern.ch (1991-10-04)
*Re: A fast way to match regexp's and extract parts of them megatest!djones@decwrl.dec.com* (1991-10-07)**
Re: A fast way to match regexp's and extract parts of them sra@ecs.soton.ac.uk (1991-10-08)

| List of all articles for this month |

Newsgroups:	comp.compilers
From:	megatest!djones@decwrl.dec.com (Dave Jones)
Keywords:	DFA, NFA
Organization:	Megatest Corporation, San Jose, Ca
References:	91-10-016
Date:	7 Oct 91 21:59:43 GMT

>From article 91-10-016, by roberto@cernvax.cern.ch (roberto bagnara):
> Let's have a set of such `extended regular expressions' (ERE), each of these
> being associated to an `action'.
> ...
> For my application the execution speed of the acceptor is *very* important.
> ...
> Finally, THE QUESTIONS ARE:
>
> 1) Is it possible to enrich a DFA with enough information to make it do
> the job (partially, at least)?
> In other words, is it possible to avoid analyzing the regular expressions
> twice?

Yep.

> 2) How?

"Enrich" it by making it a PDA (Push-down-automaton).

The part of the recognizer that matches the subexpression would have to
mark the beginning and end. You will need to use the PDA or equivalent to
keep track of nested starts. For example,

a*{aa}b

On the input "aaaaaaab", there would be several places marked as
potentially being the beginning of the {aa} part.

The Unix package "yacc" produces a PDA. (It recognizes BNF, not regular
expressions, but it is easy to translate a regular expression into BNF.)

The problem is that original was designed for small computers, and
sacrifices speed to compactness. I have a package that will compile the
yacc output (y.output) into a C-program using large switch statements. The
resulting program is about ten times larger than yaccs tables, but on a
Sun 3/60 that I once had it ran about fifteen times faster. (I also have a
program that will produce the C output directly from the .y file without
using yacc, but it is relatively untested.)

If you are interested, give me a buzz. If your application can not
tolerate the time to do the compilation, you might look into the new-awk
processor, "mawk", which was recently posted to the net. I have not
unpacked it yet, but it is advertized as being quick. You will have to add
a "start-stacking" mechanism to "enrich" the DFA.

-- Dave
(408) 441 3010
sun!megatest!djones
--

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: A fast way to match regexp's and extract parts of them

megatest!djones@decwrl.dec.com (Dave Jones)7 Oct 91 21:59:43 GMT

megatest!djones@decwrl.dec.com (Dave Jones)
7 Oct 91 21:59:43 GMT