Re: lex error "Compiler Design and Construction" (Vern Paxson)
Mon, 13 Aug 90 21:44:13 GMT

          From comp.compilers

Related articles
lex error "Compiler Design and Construction" rdfloyd@ceaport.UUCP (1990-08-09)
Re: lex error "Compiler Design and Construction" (1990-08-13)
| List of all articles for this month |

Newsgroups: comp.compilers
From: (Vern Paxson)
Keywords: C,lex,question,flex
Organization: Cornell Univ. CS Dept, Ithaca NY
References: <136@ceaport.UUCP>
Distribution: usa
Date: Mon, 13 Aug 90 21:44:13 GMT

In article <136@ceaport.UUCP> rdfloyd@ceaport.UUCP (Randy Floyd) writes:
> In "Compiler Design and Construction" by: Authur B. Pyster (second edition)
> On page 67 there is a small lex for part of C syntax but when I try to run
> it through FLEX I get an error.
> Syntax error at line 38: bad iteration values
> the definition of char and line 38 follow:
> char \'([^'\n]|\\[ntrbrf'\n]|\\0[0-7]{0,2})+\'
> ...
> Can a kind soul please give me a rundown of the definition line
> for char and tell me why I might be getting this message.

It seems likely that you are running a very old version of flex, as
this bug was fixed sometime prior to the flex 2.1 release of June 1989.
The 2.3 release should be turning up on comp.sources.unix real soon now.
You can get around the limitations of the old release by writing the
definition as:

char \'([^'\n]|\\[ntrbrf'\n]|\\0([0-7]{1,2})?)+\'

Note also that if you're requiring octal escape sequences to start with
a leading 0, then you'd better allow three more digits, or else you'll
be limited to \077 = ASCII 63! So the {0,2} in the original example
should be {0,3}. Better is to allow the constant to start with either
a 0 or a 1, or (better still) to allow it to be any number and do explicit
checking that it's within a valid range (after all, C allows '\56' as
an octal character constant).

>[This is a pretty gross way to define a character constant, but I don't
>see anything obviously wrong with it, other than that it matches things
>like '\000' ambiguously. -John]

Yes, because of the '+' operator applied to the entire interior of the
character constant and the [^'\n] pattern, the original definition is
identical to

char \'([^'\n]|\\['\n])+\'

anyway. Neither of these definitions is right, though: they will match


since the first \ inside the character constant will match the [^'\n]
pattern and then the \' sequence will match \\['\n]. To force only legal
escape sequences to be recognized, something like

char \'([^'\n\\]|\\[ntrbrf'\n\\]|\\0[0-7]{0,3})+\'

is needed. Forcing the scanner to only match correctly formed character
constants is often a mistake, though, since it makes detection and
reporting of illegal constants more difficult. One alternate way to
tackle this problem is to match character constants one character
at a time, using start conditions. Something like:

%x chcon

char charbuf[MAX_CHAR_CONST];
char *charbuf_ptr;

' charbuf_ptr = charbuf; BEGIN(chcon);

<chcon>' { /* saw closing quote - all done */
*charbuf_ptr = '\0';
/* return character constant token type and
* value to parser

<chcon>\n {
/* error - unterminated character constant */
/* generate error message */

<chcon>\\[0-7]{1,3} { /* octal escape sequence */
int result;

(void) sscanf( yytext + 1, "%o", &result );

if ( result > 0xff )
/* error, constant is out-of-bounds */

*charbuf_ptr++ = result;

<chcon>\\[0-9]+ {
/* generate error - bad escape sequence; something
* like '\48' or '\0777777'

<chcon>\\n *charbuf_ptr++ = '\n';
<chcon>\\t *charbuf_ptr++ = '\t';
<chcon>\\r *charbuf_ptr++ = '\r';
<chcon>\\b *charbuf_ptr++ = '\b';
<chcon>\\f *charbuf_ptr++ = '\f';

<chcon>\\(.|\n) *charbuf_ptr++ = yytext[1];

<chcon>. *charbuf_ptr++ = yytext[0];


Vern Paxson
Computer Science Dept. decvax!cornell!vern
Cornell University

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.