Re: NFA and negation/conjunction

Stephen Horne <sh006d3592@blueyonder.co.uk>
Sat, 15 May 2010 16:25:49 +0100

          From comp.compilers

Related articles
NFA and negation/conjunction monnier@iro.umontreal.ca (Stefan Monnier) (2010-05-13)
Re: NFA and negation/conjunction klyjikoo@gmail.com (klyjikoo) (2010-05-14)
Re: NFA and negation/conjunction sh006d3592@blueyonder.co.uk (Stephen Horne) (2010-05-15)
Re: NFA and negation/conjunction monnier@iro.umontreal.ca (Stefan Monnier) (2010-05-15)
Re: NFA and negation/conjunction klyjikoo@gmail.com (klyjikoo) (2010-05-16)
Re: NFA and negation/conjunction cfc@shell01.TheWorld.com (Chris F Clark) (2010-05-16)
Re: NFA and negation/conjunction torbenm@diku.dk (2010-05-17)
Re: NFA and negation/conjunction klyjikoo@gmail.com (klyjikoo) (2010-05-19)
Re: NFA and negation/conjunction rsc@swtch.com (Russ Cox) (2010-05-19)
[1 later articles]
| List of all articles for this month |
From: Stephen Horne <sh006d3592@blueyonder.co.uk>
Newsgroups: comp.compilers
Date: Sat, 15 May 2010 16:25:49 +0100
Organization: virginmedia.com
References: 10-05-078
Keywords: lex, DFA
Posted-Date: 16 May 2010 01:49:43 EDT

On Thu, 13 May 2010 23:42:22 -0400, Stefan Monnier
<monnier@iro.umontreal.ca> wrote:


>I'm looking for any work that tries to provide negation and/or
>conjunction operations in regexps without using a DFA (or at least
>while avoiding the DFA state explosion problem).


I've done this, but my method *does* cause a combinatorial explosion
because, basically, one of the steps is to eliminate nondeterminism.
The next step is to minimise, though - but of course DFA minimisation
only achieves so much.


I've never checked how big the intermediate results get because, in
practice, my inputs just aren't that complex anyway.


I based what I do on what Adrian Thurstons Ragel does, at least as
described in the manual.


I'm assuming that conjunction means set intersection, and negation is
set negation. Rather the implement negation, I think the set
difference is more practical - you can always do "any* - input" for a
negation. A true negation is a slippery thing anyway - it all seems
sensible while you have one fixed set of input tokens, then someone
says "now lets update that regex to support unicode". A "true"
negation would need to support an infinite set of input tokens.


Anyway, the basic principle is simple enough. Merge the two input
NFAs. No states or transitions are added - you just renumber the
states in one then combine the state and transition tables.


Then, do a closure that removes all the nondeterminism - each state in
the result mapping to a set of states in the merged input.


Then, check each state. If a state has end states from both the first
input and the second input, for a set intersection, it is itself an
end state - otherwise not. For set difference, you look for states
that have input 1 end states but no input 2 end states.


Then you do dead state removal and minimisation.


Thats the simplified explanation of course - in practice, a lot of the
work can be built into the one modified convert-to-DFA algorithm, so
that e.g. a lot of dead states are never even created.


I'm sure this isn't new to anyone here.




What I'd like to know is - is there any such thing as an efficient
algorithm to minimise an NFA?


Obviously DFA minimisation algorithms can be applied to NFAs, but in
general they don't give a minimal result - only as minimal as can be
achieved by merging equivalent states.


Consider, for example, a six state DFA loop where states are end
states if their distance from the start is divisible by two or three.
A minimal NFA would have two separate loops - a two-state loop for the
divisible-by-two case and a three-state loop for the
divisible-by-three case. Automatically spotting things like this is
interesting because it involves splitting states apart (e.g. to make a
separate six-state loop for the divisible-by-two case) as well as
merging states. And the trouble with splitting states is that there's
not necessarily any clear indicator of when to stop.


I'm guessing that this is an NP-hard or NP-complete problem - am I
right?


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.