Related articles |
---|
Handling EOF in lex & flex actions that call input() directly. greyham@research.canon.oz.au (1994-01-07) |
Re: Handling EOF in lex & flex actions that call input() directly. vern@horse.ee.lbl.gov (Vern Paxson) (1994-01-09) |
Multiline Comments mps@dent.uchicago.edu (1994-01-10) |
Newsgroups: | comp.compilers |
From: | Vern Paxson <vern@horse.ee.lbl.gov> |
Keywords: | flex |
Organization: | Compilers Central |
References: | 94-01-029 |
Date: | Sun, 9 Jan 1994 22:34:03 GMT |
> I have a separate get_comment function which reads the contents of a
> comment after the lexer has matched its start with a /* regexp. It's done
> like that because I have two different types of comment token recognised
> by the grammar, and they differ depending on whether the comment started
> as the first thing on the line, and whether the comment ended as the last
> thing on the line. This function also coalesces the contents of comments
> on consecutive lines into one token, to handle input like this:
>
> /* 1. This comment gets coalesced with */
> /* 2. this one into a single token */
>
> /* 3. But this is a separate one due to the blank line */
>
> This causes problems because it is necessary to read past the terminating
> '/' on the first comment in order to determine that there is another
> comment on the next line. If an EOF is found (as occurs after comment 3)
> after this terminating '/', the trouble starts because although we can
> simply return our final comment token, the lexer regexp state machine has
> not seen the '\n' after the comment, and so will not recognise patterns
> which match at the start of line when it starts lexing the next input
> file.
>
> In the past, I've had my get_comment function unput a newline if it
> reaches EOF so that the lexer sees an end of line and will match ^
> archored rules at the start of the next input file. This always seemed
> like a bit of a fudge though, and although it works in lex & flex 2.3, it
> fails with the latest flex 2.4.6.
Several points. First, when a flex scanner starts on a new input file,
the first token scanned is considered as occurring at the
beginning-of-a-line (as well it should, because it does!), so rules
anchored with '^' should be active. You don't need to unput a '\n' to
make this happen. (You might need to with AT&T lex, I don't know.) So if
you're having problems with this functionality with flex 2.4.6 and can put
together a test case demonstrating the problem, please send it to me, as
it's a bug.
Second, I think your problem is more simply solved by avoiding using
input() altogether. Often one can achieve a lot more using flex rules
than one initially believes. For example, the following will match your
two different types of comments, and coalesce adjacent ones.
ws [ \t]
%x comment
%%
^"/*" start_comment_at_bol(); BEGIN(comment);
"/*" start_comment_not_at_bol(); BEGIN(comment);
<comment>[^*\n]+ add_text_to_comment( yytext );
<comment>\n ++line_num; add_text_to_comment( yytext );
<comment>"*" add_text_to_comment( yytext );
<comment>"*/"{ws}*\n{ws}*"/*" /* coalesce two comments */
<comment>"*/"$ finish_comment_at_eol(); BEGIN(INITIAL);
<comment>"*/" finish_comment(); BEGIN(INITIAL);
> Another fun crowd-pleaser is that lex & flex return different values from
> input() at EOF. (lex is definitely wrong here though since it prevents
> NULs in the input).
You can work around this using
#ifdef FLEX_SCANNER
#define INPUT_EOF EOF
#else
#define INPUT_EOF 0
#endif
Vern
Vern Paxson vern@ee.lbl.gov
Information and Computing Sciences ucbvax!ee.lbl.gov!vern
Lawrence Berkeley Laboratory (510) 486-7504
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.