Re: Can (f)lex handle the NULL character?

dcardani@totalint.com (Darrin Cardani)
16 May 1999 15:13:33 -0400

          From comp.compilers

Related articles
Can (f)lex handle the NULL character? dcardani@totalint.com (1999-05-07)
Re: Can (f)lex handle the NULL character? rkrayhawk@aol.com (1999-05-09)
Re: Can (f)lex handle the NULL character? dcardani@totalint.com (1999-05-16)
Re: Can (f)lex handle the NULL character? rkrayhawk@aol.com (1999-05-20)
| List of all articles for this month |

From: dcardani@totalint.com (Darrin Cardani)
Newsgroups: comp.compilers
Date: 16 May 1999 15:13:33 -0400
Organization: Total Integration, Inc.
References: 99-05-018 99-05-032
Keywords: lex

rkrayhawk@aol.com (RKRayhawk) wrote:
> Could you maybe share some of the flex code you are having problems
> with here. It is possible that you are not having trouble in your
> regular expression code or the flex rules.


Here's what the relevant parts of my flex code looks like:


stream { BEGIN INSTREAM; return STREAM; }
<INSTREAM>endstream { BEGIN NOTINITIAL; return ENDSTREAM; }
<INSTREAM>. { return ANYTHING; }


When I see the "stream" keyword, I switch into stream mode and return
every character (all values from 0x00 to 0xFF) as the token "ANYTHING"
until I see the token "endstream". I've also tried adding a rule
specifically for the NUL character, like this:


<INSTREAM>\x00 { return ANYTHING; }
or this:
<INSTREAM>"\0" { return ANYTHING; }


In both cases, when the lexer reads a 0x00 byte, it goes into the infinite
loop I spoke of earlier. I made sure to put the 0x00 bytes before the rule
with "<INSTREAM>." to make sure it was getting used, but it made no
difference.


> Instead, you may be having problems passing NULL as a token.


If I understand you correctly, you're thinking that the problem was that I
was returning 0 instead of my "ANYTHING" token. Right? "ANYTHING" is
defined as 258. So I assume that means I'm doing things correctly.


James Kuehn suggested this:
> In your lex.yy.c, look at the input() macro.
> Figure out how NULL characters are handled (they are often treated
> as end-of-input). Redefine the macro to pass NULL characters
> (and handle EOF yourself) or map NULL to something else and
> write rules to recognize that.


I attempted to step through the code in lex.yy.c. The input () macro is
basically just a call to fread (). It looks like this:


#define YY_INPUT(buf,result,max_size) \
        if ( yy_current_buffer->yy_is_interactive ) \
                { \
                int c = '*', n; \
                for ( n = 0; n < max_size && \
                                  (c = getc( yyin )) != EOF && c != '\n'; ++n ) \
                        buf[n] = (char) c; \
                if ( c == '\n' ) \
                        buf[n++] = (char) c; \
                if ( c == EOF && ferror( yyin ) ) \
                        YY_FATAL_ERROR( "input in flex scanner failed" ); \
                result = n; \
                } \
        else if ( ((result = fread( buf, 1, max_size, yyin )) == 0) \
                    && ferror( yyin ) ) \
                YY_FATAL_ERROR( "input in flex scanner failed" );


We're not working interactively, so I'm assuming it just calls the fread
function and the first part of the "if" statement is never called. There is
an "input ()" function, but it doesn't seem to ever get called. And oddly,
it seems to take care of the case I'm running into. It has the following:


static int input()
        {
        int c;


        *yy_c_buf_p = yy_hold_char;


        if ( *yy_c_buf_p == YY_END_OF_BUFFER_CHAR )
                {
                /* yy_c_buf_p now points to the character we want to return.
                  * If this occurs *before* the EOB characters, then it's a
                  * valid NUL; if not, then we've hit the end of the buffer.
                  */
        [...rest deleted...]


And indeed, YY_END_OF_BUFFER_CHAR is defined as:


#define YY_END_OF_BUFFER_CHAR 0


So I think this is my problem. How do I get flex to use the input ()
function rather than the YY_INPUT macro?


Thanks,
Darrin
--
Darrin Cardani
[That's not it -- the input function is for use in user routines, not the
main lexer. I don't see anything in YY_INPUT that would screw up with a
null character. You need to keep tracing and see what's actually looping.
-John]



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.