Re: Handling EOF in lex & flex actions that call input() directly.

Vern Paxson <>
Sun, 9 Jan 1994 22:34:03 GMT

          From comp.compilers

Related articles
Handling EOF in lex & flex actions that call input() directly. (1994-01-07)
Re: Handling EOF in lex & flex actions that call input() directly. (Vern Paxson) (1994-01-09)
Multiline Comments (1994-01-10)
| List of all articles for this month |

Newsgroups: comp.compilers
From: Vern Paxson <>
Keywords: flex
Organization: Compilers Central
References: 94-01-029
Date: Sun, 9 Jan 1994 22:34:03 GMT

> I have a separate get_comment function which reads the contents of a
> comment after the lexer has matched its start with a /* regexp. It's done
> like that because I have two different types of comment token recognised
> by the grammar, and they differ depending on whether the comment started
> as the first thing on the line, and whether the comment ended as the last
> thing on the line. This function also coalesces the contents of comments
> on consecutive lines into one token, to handle input like this:
> /* 1. This comment gets coalesced with */
> /* 2. this one into a single token */
> /* 3. But this is a separate one due to the blank line */
> This causes problems because it is necessary to read past the terminating
> '/' on the first comment in order to determine that there is another
> comment on the next line. If an EOF is found (as occurs after comment 3)
> after this terminating '/', the trouble starts because although we can
> simply return our final comment token, the lexer regexp state machine has
> not seen the '\n' after the comment, and so will not recognise patterns
> which match at the start of line when it starts lexing the next input
> file.
> In the past, I've had my get_comment function unput a newline if it
> reaches EOF so that the lexer sees an end of line and will match ^
> archored rules at the start of the next input file. This always seemed
> like a bit of a fudge though, and although it works in lex & flex 2.3, it
> fails with the latest flex 2.4.6.

Several points. First, when a flex scanner starts on a new input file,
the first token scanned is considered as occurring at the
beginning-of-a-line (as well it should, because it does!), so rules
anchored with '^' should be active. You don't need to unput a '\n' to
make this happen. (You might need to with AT&T lex, I don't know.) So if
you're having problems with this functionality with flex 2.4.6 and can put
together a test case demonstrating the problem, please send it to me, as
it's a bug.

Second, I think your problem is more simply solved by avoiding using
input() altogether. Often one can achieve a lot more using flex rules
than one initially believes. For example, the following will match your
two different types of comments, and coalesce adjacent ones.

ws [ \t]

%x comment


^"/*" start_comment_at_bol(); BEGIN(comment);
"/*" start_comment_not_at_bol(); BEGIN(comment);

<comment>[^*\n]+ add_text_to_comment( yytext );
<comment>\n ++line_num; add_text_to_comment( yytext );
<comment>"*" add_text_to_comment( yytext );
<comment>"*/"{ws}*\n{ws}*"/*" /* coalesce two comments */
<comment>"*/"$ finish_comment_at_eol(); BEGIN(INITIAL);
<comment>"*/" finish_comment(); BEGIN(INITIAL);

> Another fun crowd-pleaser is that lex & flex return different values from
> input() at EOF. (lex is definitely wrong here though since it prevents
> NULs in the input).

You can work around this using

#define INPUT_EOF 0


Vern Paxson
Information and Computing Sciences ucbvax!!vern
Lawrence Berkeley Laboratory (510) 486-7504

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.