right context in scanner generators (was Re: LEX and YACC)

boehm@flora.rice.edu (Hans Boehm)
12 Nov 89 00:26:41 GMT

          From comp.compilers

Related articles
right context in scanner generators (was Re: LEX and YACC) boehm@flora.rice.edu (1989-11-12)
Re: right context in scanner generators (was Re: LEX and YACC) boehm@flora.rice.edu (1989-11-12)
Re: right context in scanner generators (was Re: LEX and YACC) vern@cs.cornell.edu (1989-11-13)
Re: right context in scanner generators (was Re: LEX and YACC) peterd@cs.washington.edu (1989-11-13)
| List of all articles for this month |
From: boehm@flora.rice.edu (Hans Boehm)
Newsgroups: comp.compilers
Summary: The standard algorithm is buggy
Date: 12 Nov 89 00:26:41 GMT
References: <1989Nov11.161355.10081@esegue.segue.boston.ma.us>
Original-sender: root@rice.edu
From: boehm@flora.rice.edu (Hans Boehm)
Organization: Rice University, Houston

A number of scanner generators, including lex, make the claim that
they can handle regular expressions specifying right context.
However, as was pointed out to me by a student in a compiler class
I was teaching, the implementation that lex uses (namely that described
in the Aho and Ullman text) is wrong. (I believe the same bug exists
in Aho, Sethi, and Ullman, but I don't have a copy handy at the moment.)
An example is:


(a|ab)/ba


on input "aba". Lex will consider "ab" to be the first token instead of "a".
(It first looks for the concatenation of both regular expressions, then back-
tracks to the rightmost occurrence of a final state corresponding to the first
regular expression.) Are there any scanner generators that actually
do this right?


I believe a correct solution is to build a finite automaton for the
reversal of the second regular expression, and run it during the backtracking
process. This is not asymptotically worse than the standard (incorrect)
algorithm, but it is more complicated than I would like.


Hans-J. Boehm
boehm@rice.edu
[How about stepping across the matched token, trying to match the second
expression; when it matches all the way to the end you've found the right
context. It's never been clear to me how useful lex-style right context
is. Anyone have some real examples? -John]





Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.