Re: Problem with flex, parsing a large file

Chris Dodd <cdodd@acm.org>
3 Feb 2006 21:00:57 -0500

          From comp.compilers

Related articles
Problem with flex, parsing a large file deharbe@gmail.com (David Deharbe) (2006-02-03)
Re: Problem with flex, parsing a large file cdodd@acm.org (Chris Dodd) (2006-02-03)
Re: Problem with flex, parsing a large file deharbe@gmail.com (David Deharbe) (2006-02-11)
| List of all articles for this month |

From: Chris Dodd <cdodd@acm.org>
Newsgroups: comp.compilers
Date: 3 Feb 2006 21:00:57 -0500
Organization: Compilers Central
References: 06-02-036
Keywords: lex
Posted-Date: 03 Feb 2006 21:00:57 EST

David Deharbe <deharbe@gmail.com> wrote in news:06-02-036@comp.compilers:
> I am writing a compiler for TSTP, a small language (or format), for
> theorem provers. I am using flex-2.5.4 and GNU bison-1.28, and I am
> encountering a problem that seems to be related to flex.
>
> While parsing a large file (>25MB), the program stopped and output the
> following message:
> "input buffer overflow, can't enlarge buffer because scanner uses
> REJECT".


The problem almost certainly is coming from your "string" tokens:


> /* <single quoted> ::= '<char>*' */
> single_quoted \'[^\']*\'
> /* <double quoted> ::= "<char>*" */
> double_quoted \"[^\"]*\"
                :
> {single_quoted} { return SINGLE_QUOTED ; }
> {double_quoted} { return DOUBLE_QUOTED ; }


For either of these tokens, on seeing a quote character, it will
slurp up the input until it finds the matching end quote. If that
is missing, however, it will read in the entire rest of the input
into the input buffer (resizing it larger and larger to hold it). So
when you have REJECT (from automatic yylineno), you get the error,
and if you get rid of REJECT, you still try to pull in the entire input.
Since you then apparently have no rule that can match a quote without
a matching quote, it gets lost.


The usual solution is to not allow newlines in strings and more carefully
detect malformed strings:


single_quoted \'[^\'\n]*\'
double_quoted \"[^\"\n]*\"
%%
{single_quoted} { return SINGLE_QUOTED ; }
{double_quoted} { return DOUBLE_QUOTED ; }
\' { error("unmatched \'"); }
\" { error("unmatched \""); }




Chris Dodd
cdodd@acm.org


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.