Re: Wanted: Regular Expression -> Finite Automata C code =-

megatest!plethorax!djones@uu2.psi.com (Dave Jones)
Fri, 9 Apr 1993 11:39:29 GMT

          From comp.compilers

Related articles
Re: Wanted: Regular Expression -> Finite Automata C code =- tdh6t@helga3.acc.virginia.edu (Todd Hodes) (1993-04-05)
Re: Wanted: Regular Expression -> Finite Automata C code =- megatest!plethorax!djones@uu2.psi.com (1993-04-09)
Re: Wanted: Regular Expression -> Finite Automata C code =- henry@zoo.toronto.edu (1993-04-12)
Re: Wanted: Regular Expression -> Finite Automata C code =- tdh6t@onyx.cs.virginia.edu (Todd Hodes) (1993-04-14)
| List of all articles for this month |

Newsgroups: comp.compilers
From: megatest!plethorax!djones@uu2.psi.com (Dave Jones)
Keywords: lex, DFA
Organization: Compilers Central
References: 93-04-023
Date: Fri, 9 Apr 1993 11:39:29 GMT

tdh6t@helga3.acc.virginia.edu (Todd Hodes):
> I wanted to use the code in Sedgewick's 'Algorithms in C' book,
> but found the following bug: ...


Last month I posted an article pointing out two bugs in Sedgewick's Pascal
algorithm for matching the NFA against an input string. The bug Todd
reports is in the algorithm for building the NFA, in the "C" book, which I
have never seen.


Helpful suggestion: Don't try to salvage the Sedgewick algorithms. See
the New Dragon Book, "Compilers, Priciples, Techniques, and Tools", by
Aho, Sethi, and Ullman; Addison Wesley; Reading, Mass; 1988; IBSN
0-201-10088-6.


Both procedures are sketched out almost correctly in algorithms 3.3 and
3.4. Algorithm 3.4 is not quite right, but you'll figure it out when you
start to implement it. As written, it terminates only when the input
string has been read to the end. It should terminate either then or when
the NFA is in a state in which it can make no move on the current input
character. In the language the book uses, that will be when
e-closure(move(S,a)) is empty.


Also, every time a final state is encountered, you will want to remember
the number of characters that have been processed at that point. The last
number remembered will be the length of the longest match.


One key difference between the Dragon Book algorithm and the Sedgewick
algorithm is the use of the bit-vector to prevent putting the same NFA
state into the the "next state" set more than once. Come to think of it,
do you need another bit-vector to keep from putting the same set into the
"current state" set more than once when calculating the e-closure? Hmm.


The other Seg. problem is stopping when a final NFA state is first
encountered. In other words, it finds the shortest match -- seldom what is
wanted, particularly for expressions that can match the empty string!


I haven't verified the bug you report against the other Sedgewick
algorithm, so I don't know what the difference is, but I think the Dragon
algorithm 3.3 is correct, if memory serves. It's been about 10 years since
I looked at it though (in the old Dragon book).


Good luck.


Dave
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.