Fri, 9 Apr 1993 11:39:29 GMT

Related articles |
---|

Re: Wanted: Regular Expression -> Finite Automata C code =- tdh6t@helga3.acc.virginia.edu (Todd Hodes) (1993-04-05) |

Re: Wanted: Regular Expression -> Finite Automata C code =- megatest!plethorax!djones@uu2.psi.com (1993-04-09) |

Re: Wanted: Regular Expression -> Finite Automata C code =- henry@zoo.toronto.edu (1993-04-12) |

Re: Wanted: Regular Expression -> Finite Automata C code =- tdh6t@onyx.cs.virginia.edu (Todd Hodes) (1993-04-14) |

Newsgroups: | comp.compilers |

From: | megatest!plethorax!djones@uu2.psi.com (Dave Jones) |

Keywords: | lex, DFA |

Organization: | Compilers Central |

References: | 93-04-023 |

Date: | Fri, 9 Apr 1993 11:39:29 GMT |

tdh6t@helga3.acc.virginia.edu (Todd Hodes):

*> I wanted to use the code in Sedgewick's 'Algorithms in C' book,*

*> but found the following bug: ...*

Last month I posted an article pointing out two bugs in Sedgewick's Pascal

algorithm for matching the NFA against an input string. The bug Todd

reports is in the algorithm for building the NFA, in the "C" book, which I

have never seen.

Helpful suggestion: Don't try to salvage the Sedgewick algorithms. See

the New Dragon Book, "Compilers, Priciples, Techniques, and Tools", by

Aho, Sethi, and Ullman; Addison Wesley; Reading, Mass; 1988; IBSN

0-201-10088-6.

Both procedures are sketched out almost correctly in algorithms 3.3 and

3.4. Algorithm 3.4 is not quite right, but you'll figure it out when you

start to implement it. As written, it terminates only when the input

string has been read to the end. It should terminate either then or when

the NFA is in a state in which it can make no move on the current input

character. In the language the book uses, that will be when

e-closure(move(S,a)) is empty.

Also, every time a final state is encountered, you will want to remember

the number of characters that have been processed at that point. The last

number remembered will be the length of the longest match.

One key difference between the Dragon Book algorithm and the Sedgewick

algorithm is the use of the bit-vector to prevent putting the same NFA

state into the the "next state" set more than once. Come to think of it,

do you need another bit-vector to keep from putting the same set into the

"current state" set more than once when calculating the e-closure? Hmm.

The other Seg. problem is stopping when a final NFA state is first

encountered. In other words, it finds the shortest match -- seldom what is

wanted, particularly for expressions that can match the empty string!

I haven't verified the bug you report against the other Sedgewick

algorithm, so I don't know what the difference is, but I think the Dragon

algorithm 3.3 is correct, if memory serves. It's been about 10 years since

I looked at it though (in the old Dragon book).

Good luck.

Dave

--

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.