Re: regular expressions (bug-report)

mab@wdl39.wdl.loral.com (Mark A Biggar)
Mon, 15 Mar 1993 17:06:03 GMT

          From comp.compilers

Related articles
regular expressions (bug-report) megatest!plethorax!djones@uu4.psi.com (1993-03-14)
Re: regular expressions (bug-report) Paul.Klint@cwi.nl (1993-03-15)
Re: regular expressions (bug-report) mab@wdl39.wdl.loral.com (1993-03-15)
regular expressions (bug-report) jos@thinx.convex.com (Jos Horsmeier) (1993-03-16)
Re: regular expressions (bug-report) pardo@cs.washington.edu (1993-03-19)
Re: regular expressions (bug-report) henry@zoo.toronto.edu (1993-03-25)
Re: regular expressions (bug-report) kanze@us-es.sel.de (1993-03-26)
| List of all articles for this month |
Newsgroups: comp.compilers
From: mab@wdl39.wdl.loral.com (Mark A Biggar)
Keywords: lex, errors
Organization: Loral Western Development Labs
References: 93-03-046
Date: Mon, 15 Mar 1993 17:06:03 GMT

megatest!plethorax!djones@uu4.psi.com (Dave Jones) writes:
> <<discussion of problems with NFA based RE packages including [] >>
>Is there anything published on this? Does anyone know how the various
>reg-exp packages in existence do it? Do they work correctly all the time?
>(I've found that most reg-exp packages reject some strings that should
>match.)


Most of the Unix based RE packages don't use the NFA or DFA methods (a
notable exception is egrep). They use an interpretive backtracking
algorithm using a stack of backtrack points. Prime examples of this are
grep, Henry Spenser's RE package and Perl (which is a heavly modified
version of Henry's package). There are three reasons why NFA/DFA based
methods are not used:


1) the NFA to DFA algorithm produces really big tables even for simple
cases.


2) DAF based pattern matches don't handle [A-Z] well (a major resaon why
they get so big).


3) Unix RE packages like to support the () \1 substring and remembered
stuff, which can't be implemented using DFA methods at all.
For example the pattern /^(a*)b\1b\1$/ matchs strings in the
language a^nba^nba^n which is a context sensitive language and
so is not a "regular" expression at all.


In addition interpretive methods allow for optimizations such as using
boyer-moore table driven matching methods for parts of a pattern that
contain no metacharacters.
--
Mark Biggar
mab@wdl1.wdl.loral.com
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.