A fast way to match regexp's and extract parts of them

roberto@cernvax.cern.ch (roberto bagnara)
4 Oct 91 09:46:43 GMT

          From comp.compilers

Related articles
A fast way to match regexp's and extract parts of them roberto@cernvax.cern.ch (1991-10-04)
Re: A fast way to match regexp's and extract parts of them megatest!djones@decwrl.dec.com (1991-10-07)
Re: A fast way to match regexp's and extract parts of them sra@ecs.soton.ac.uk (1991-10-08)
| List of all articles for this month |
Newsgroups: comp.compilers
From: roberto@cernvax.cern.ch (roberto bagnara)
Keywords: lex, question, DFA
Organization: CERN, Geneva, Switzerland
Date: 4 Oct 91 09:46:43 GMT

Hi all,


        I'am facing the following problem.


DEFINITION (by examples):
------------------------


Let's call `extended regular expressions' a syntactic object of the form:


a{b*c}d{[0-9]*}
_{[A-Za-z_][A-Za-z0-9_]*}


where braces '{' and '}' are used to delimit regular sub-expressions.


THE PROBLEM
-----------


Let's have a set of such `extended regular expressions' (ERE), each of these
being associated to an `action'. That `action' has as many parameters
as there are '{}'-enclosed sub-expression in the associated ERE.
For example:


RE1) a{b*c}d{[0-9]*} action: foo($1, $2)
RE2) _{[A-Za-z_][A-Za-z0-9_]*} action: enter_priv_symbol($1)


I need an `acceptor' that, having recognized the occurrence of a string
matching an ERE, would "call" the associated `action' providing the
parameters extracted from the matching string.
For example, the input


"abbcd12" would result in a `call foo("bbc", "12")'.


For my application the execution speed of the acceptor is *very* important.
Now I'm using a DFA (to recognize patterns) and the GNU regexp package
(to extract the sub-pattern to be passed as parameters). This solution
is not as fast as I'd like it to be.


Finally, THE QUESTIONS ARE:


1) Is it possible to enrich a DFA with enough information to make it do
      the job (partially, at least)?
      In other words, is it possible to avoid analyzing the regular expressions
      twice?
2) How?
3) If the answer to 1) is *no*, other ideas?


Thanks a lot
                                Roberto


P.S. Please respond by EMail, I'll summarize on request.
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.