21 Apr 2004 00:51:51 -0400

Related articles |
---|

thompson's construction: superfluous epsilon transitions ? spur4444@yahoo.com (2004-04-03) |

Re: thompson's construction: superfluous epsilon transitions ? torbenm@diku.dk (2004-04-15) |

Re: thompson's construction: superfluous epsilon transitions ? torbenm@diku.dk (2004-04-21) |

From: | torbenm@diku.dk (Torben Ęgidius Mogensen) |

Newsgroups: | comp.compilers |

Date: | 21 Apr 2004 00:51:51 -0400 |

Organization: | Department of Computer Science, University of Copenhagen |

References: | 04-04-022 |

Keywords: | parse |

Posted-Date: | 21 Apr 2004 00:51:51 EDT |

spur4444@yahoo.com (Spur) writes:

*> I have seen Thompson construction diagrams of two kinds -*

*> minimalistic, and with additional Eps edges, probably for clarity.*

*>*

*> For example, the following are two ways to represent the concatenation*

*> of a and b:*

*>*

*> (S0)---a-->(S1)---b-->(F)*

*>*

*> (S0)---a-->(S1)---Eps-->(S2)---b-->(F)*

*>*

*> Similarly, there are two ways to represent a|b, one with just 2 states*

*> and two edges (a and b) between them, and another with 6 states,*

*> padded with Eps edges all along.*

*>*

*> Thus, I want to ask whether the Kleene transition (a*) can be made*

*> simpler ?*

*> Why can't it be just:*

*>*

*> <~~~~Eps~~~~~~|*

*> (S0)---a,Eps-->(F)*

*>*

*> That is, from the start state edges to the final state on a or Eps,*

*> and back on Eps ?*

*>*

*> It looks like this will accept a* too, doesn't it ? So why isn't it*

*> presented as an alternative to the "normal" 4 state representation ?*

*>*

*> The only problem I see here is the "Eps loop", i.e. Eps leads from S0*

*> to F, and Eps leads from F to S0. Is this illegal in NFAs ? I guess it*

*> can be a problem because to any accepting path and infinite string of*

*> S0->F->S0->F... may be added since the transitions are Eps. But can't*

*> we define the accepting string as the minimal and solve this problem ?*

Thompsons construction is compositional, i.e., you combine NFA's

without knowing anything about what regular expressions they

represent. The classical construction has the following invariants:

1) Each NFA used in the construction has unique start and end states.

2) There are no transitions from the NFA to its start state.

The second invariant allows the concatenation rule to combine the end

state of the first NFA with the start state of the second NFA. Let us

assume that the invariant did not hold: For example let us assume that

NFA 1 is

___

/ a \

V \

(S0)-a->(S1)

and NFA 2 is

___

/ b \

V \

(S2)-b->(S3)

If we concatenating these by combining S1 and S2 to S4, we would get

___ ___

/ a \ / b \

V \ V \

(S0)-a->(S4)-b->(S3)

which would recognize erroneous strings (e.g., abbaab).

If you combine start and end states in the alternative construction

you might get wrong results if there are transitions out of the end

states into the component NFA's, but you could get around this by

adding an extra invariant that says this doesn't occur. You will have

to make sure that your constructions preserve all invariants, though.

To conclude, you can have variants of Thompsons construction with

different number of invariants. More invariants allow more states to

be merged when NFA's are combined, but may require additional states

and epsilon transitions to preserve the invariants. The optimum

depends on the mix of different constructions (e.g., how often Kleene

star is used compared to concatenation or alternative).

You will find special cases where the NFA's you combine obey stronger

properties than the invariants guarantee, and in these cases you can

save a few states or transitions. Your examples of a|b or a* are in

this category.

Torben Mogensen

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.