Re: Need regexp source

Chris F Clark <world!cfc@uunet.uu.net>
19 Jan 2000 01:08:03 -0500

From comp.compilers

Related articles
[3 earlier articles]
Re: Need regexp source george@castro.dbnet.ece.ntua.gr (2000-01-09)
Re: Need regexp source roumazeilles.NO.SPAM@NO.SPAM.magic.fr (Yves Roumazeilles) (2000-01-09)
Re: Need regexp source mottl@miss.wu-wien.ac.at (Markus Mottl) (2000-01-09)
Re: Need regexp source jenglish@flightlab.com (Joe English) (2000-01-09)
Re: Need regexp source ralph@inputplus.demon.co.uk (2000-01-15)
Re: Need regexp source thp@roam-thp2.cs.ucr.edu (Tom Payne) (2000-01-15)
*Re: Need regexp source world!cfc@uunet.uu.net (Chris F Clark)* (2000-01-19)**

| List of all articles for this month |

From:	Chris F Clark <world!cfc@uunet.uu.net>
Newsgroups:	comp.compilers,comp.lang.c
Date:	19 Jan 2000 01:08:03 -0500
Organization:	The World Public Access UNIX, Brookline, MA
References:	00-01-006 00-01-019 00-01-060
Keywords:	lex, DFA

Tom Payne <thp@roam-thp2.cs.ucr.edu> wrote:
> Yves Roumazeilles <roumazeilles@magic.fr> wrote:
. . .
> >> [There's always lex, I suppose. Is there a reasonable way to do interative
> >> regex matching without all of the work of building a DFA first? -John]
>
> > I'm seriously doubting it is possible to avoid building a DFA ro you
> > will have to interpret the expression again and again for each
> > comparison you make (read "killing the performance").
>
> One could maintain a set of pointers into the regular expression
> corresponding to each of the possible (nondeterministic) states of the
> match.

If you keep multiple pointers into the states of an NFA, you are
effectively doing Earley parsing and the same performance holds. If
you don't care about back-references and parenthesis, you can merge
pointers to same NFA state without regard to context and mke the
performance linear in the input size. Effectively by doing that, you
are translating the NFA to a DFA one step at a time while parsing.
With some memo-ization, you could even build an actual DFA in the
process to optimize repeated transitions across the same arc
(e.g. where you've been in this DFA state before and have seen the
same input character).

Note, this tells one something about the correspondence between NFA's
and DFA's (a DFA state is exactly equivalent to some set of NFA states
being active concurrently). Thus, a DFA can be as big as 2**n times
as large (in number of states) as the corresponding NFA for the same
regular language. This is where the exponential warning in the LEX
documentation comes from.

If I recall correctly there is a(n experimental?) version of grep that
someone did that does exactly the above, builds an NFA and then builds
the DFA as it traverses the NFA. However, I don't recall who wrote it
or even who told me about it.

Sorry,
-Chris

*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. CompuServe : 74252,1375
3 Proctor Street voice : (508) 435-5016
Hopkinton, MA 01748 USA fax : (508) 435-4847 (24 hours)
------------------------------------------------------------------------------
Web Site in Progress: Web Site : http://world.std.com/~compres

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Need regexp source

Chris F Clark <world!cfc@uunet.uu.net>19 Jan 2000 01:08:03 -0500

Chris F Clark <world!cfc@uunet.uu.net>
19 Jan 2000 01:08:03 -0500