Related articles |
---|
Possible bug in lex with trailing context expressions? greyham@research.canon.oz.au (1994-01-19) |
Re: Possible bug in lex with trailing context expressions? vern@daffy.ee.lbl.gov (1994-01-19) |
Newsgroups: | comp.compilers |
From: | greyham@research.canon.oz.au (Graham Stoney) |
Summary: | lex appears to mishandle trailing context rules. |
Keywords: | lex, flex, errors, comment |
Organization: | Canon Information Systems Research Australia |
Date: | Wed, 19 Jan 1994 01:56:27 GMT |
While attempting to construct some lex rules to extract the contents of C
style comments, I appear to have found a problem/bug with lex regarding
trailing context expressions. The following lexer is intended to extract the
contents of any C style comments in its input, sending them to the standard
output with leading and trailing *'s and spaces stripped:
/* lexbug.l: Rudimentary comment-contents lexer. */
/* This appears to show a bug with lex matching trailing context correctly. */
WS [ \t]
%S COMMENT
%%
<INITIAL>.|\n ;
<INITIAL>"/*""*"*{WS}+ { putchar('`'); BEGIN COMMENT; }
<INITIAL>"/*""*"+/[^/] { putchar('`'); BEGIN COMMENT; }
<COMMENT>{WS}*"*"+"/" { puts("'"); BEGIN INITIAL; }
<COMMENT>[^*\n \t]* |
<COMMENT>{WS}* |
<COMMENT>"*"+[^*/\n]* ECHO;
<COMMENT>{WS}*\n putchar('\n');
The problem is that the trailing context in the third rule does not appear to
match correctly when presented with degenerate input like `/***/': I expected
that it would match this as `/**', entering the COMMENT state and leaving `*/'
to match the first COMMENT rule. Thus, this input would lex correctly and
output the empty string `'. Instead, lex seems to ignore the prescence of the
trailing context `[^/]' in the rule altogether, and matches `/***'; which
fouls things up when it enters the COMMENT state since it will not recognise
the `/' as being the end of the comment.
flex version 2.3 handles the situation as I'd expected, making me think that
perhaps there is a bug in lex that I've stumbled upon.
One workaround is to include the trailing context in the main expression, and
use yyless() to push it back again; this works OK, but should be unnecessary.
It's like this:
/* lexok.l: Rudimentary comment-contents lexer. */
/* This one works around a lex bug regarding trailing context matching. */
WS [ \t]
%S COMMENT
%%
<INITIAL>.|\n ;
<INITIAL>"/*""*"*{WS}+ { putchar('`'); BEGIN COMMENT; }
<INITIAL>"/*""*"+[^/] { yyless(yyleng-1); putchar('`'); BEGIN COMMENT; }
<COMMENT>{WS}*"*"+"/" { puts("'"); BEGIN INITIAL; }
<COMMENT>[^*\n \t]* |
<COMMENT>{WS}* |
<COMMENT>"*"+[^*/\n]* ECHO;
<COMMENT>{WS}*\n putchar('\n');
Does anyone know why lex might be acting this way? Judging from the paper
"Lex - A Lexical Analyser Generator" by M. E. Lesk and E. Schmidt, the trailing
context method is the preferred form and ought to work. Can anyone shed some
light on what I might be doing wrong?
And finally, here's a transcript of running the above analysers:
greyham@jaco% lex lexbug.l; cc lex.yy.c -ll
greyham@jaco% a.out
/********* hi mom ********/
`hi mom'
/*************************/
`/ # thinks it's still in COMMENT state!
^D
greyham@jaco% flex lexbug.l; cc lex.yy.c -lfl
greyham@jaco% a.out
/********* hi mom ********/
`hi mom'
/*************************/
`'
^D
greyham@jaco% lex lexok.l; cc lex.yy.c -ll
greyham@jaco% a.out
/********* hi mom ********/
`hi mom'
/*************************/
`'
^D
greyham@jaco% flex lexok.l; cc lex.yy.c -lfl
greyham@jaco% a.out
/********* hi mom ********/
`hi mom'
/*************************/
`'
^D
Thanks,
Graham
--
Graham Stoney, Hardware/Software Engineer
Canon Information Systems Research Australia
Ph: + 61 2 805 2909 Fax: + 61 2 805 2929
[There are dozens of bugs in AT&T lex, so many that I've rarely seen a
usefully complex lexer that worked with it. Since flex is better than
AT&T lex in every way, and is legally unencumbered (no, it's not
copylefted), I see no reason ever to use lex. -John]
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.