Problem with flex, parsing a large file

David Deharbe <deharbe@gmail.com>
3 Feb 2006 18:43:44 -0500

          From comp.compilers

Related articles
Problem with flex, parsing a large file deharbe@gmail.com (David Deharbe) (2006-02-03)
Re: Problem with flex, parsing a large file cdodd@acm.org (Chris Dodd) (2006-02-03)
Re: Problem with flex, parsing a large file deharbe@gmail.com (David Deharbe) (2006-02-11)
| List of all articles for this month |

From: David Deharbe <deharbe@gmail.com>
Newsgroups: comp.compilers
Date: 3 Feb 2006 18:43:44 -0500
Organization: Compilers Central
Keywords: flex, question

Dear all,


I am writing a compiler for TSTP, a small language (or format), for
theorem provers. I am using flex-2.5.4 and GNU bison-1.28, and I am
encountering a problem that seems to be related to flex.


Currently, I have developped the following files:
    - tstp.lex: input to flex
    - tstp.y: input to bison
    - test.c: a program that open a given file, and calls the generated
parser on this file.


While parsing a large file (>25MB), the program stopped and output the
following message:
    "input buffer overflow, can't enlarge buffer because scanner uses REJECT".


I was puzzled, as the scanner does not use REJECT explicitly. I
googled the error message and found out that this may be caused by the
yylineno option. So I rewrote the scanner and coded myself the yyline
no option.


After that, parsed again the large file, and the parser seems to get
lost as it does not complete execution in a reasonable amount of time.
So I tried to localize the source of the error and split this large
file into several mid-sized files, each with 1000 lines. The program
was able to parse all of them in 9 seconds!


I included actions in the parser to print the line number to see where
things went wrong (line 3437). I regenerated the program using the
available debug options. The resulting program was able to parse past
line 3437 (I had to stop it when it reached line 7200)!


I include the scanner at the end of this message. If anyone can
provide me a solution of this problem, or a clue on how to solve it, I
would be grateful! You can reach me at "deharbe@gmail.com".


By the way: uname -a yields:
Darwin david-deharbes-ibook-g4.local 8.3.0 Darwin Kernel Version
8.3.0: Mon Oct 3 20:04:04 PDT 2005;
root:xnu-792.6.22.obj~2/RELEASE_PPC Power Macintosh powerpc


Best regards,


David.
--


/* scanner begins*/


%{
#include "tstpsyn.h"


extern int tstp_lineno;


%}


%option noyywrap
%x c_comment




/* <upper word> ::= <A-Z><a-z0-9A-Z_>* */
upper_word [A-Z][a-zA-Z0-9_]*
/* <lower word> ::= <a-z><a-z0-9A-Z_>* */
lower_word [a-z][a-zA-Z0-9_]*
/* <single quoted> ::= '<char>*' */
single_quoted \'[^\']*\'
/* <double quoted> ::= "<char>*" */
double_quoted \"[^\"]*\"
/* <real> ::= <integer><decimal part>
      <sign> ::= + | - | <null>
      <unsigned integer> ::= <0-9>+
      <decimal part> ::= .<0-9>+ | <null> */
/* DD: we cannot maintain the ambiguity in the TPTP3/TSTP definition that
      any integer is also a real */
unsigned_integer [0-9]+
integer [\-\+]?{unsigned_integer}
decimal_part \.[0-9]*
real {integer}{decimal_part}
percent_comment \#[^$]*$
%%
"/*" BEGIN(c_comment);


<c_comment>[^*\n]*
<c_comment>[^*\n]*\n { printf("%i\n", ++tstp_lineno); }
<c_comment>"*"+[^*/\n]*
<c_comment>"*"+[^*/\n]*\n { printf("%i\n", ++tstp_lineno); }
<c_comment><<EOF>> { tstp_error("unterminated comment"); yyterminate(); }
<c_comment>"*"+"/" BEGIN(INITIAL);


"equality:s" { return THEORY_NAME; } /* DD: see comment in file tstp.y */
"equality" { return THEORY_NAME; }


"axiom_of_choice" { return TK_AXIOM_OF_CHOICE ; }
"axiom" { return TK_AXIOM ; }
"cnf" { return TK_CNF ; }
"conjecture" { return TK_CONJECTURE ; }
"creator" { return TK_CREATOR ; }
"definition" { return TK_DEFINITION ; }
"description" { return TK_DESCRIPTION ; }
"fof" { return TK_FOF ; }
"file" { return TK_FILE ; }
"hypothesis" { return TK_HYPOTHESIS ; }
"include" { return TK_INCLUDE ; }
"inference" { return TK_INFERENCE ; }
"introduced" { return TK_INTRODUCED ; }
"iquote" { return TK_IQUOTE ; }
"lemma_conjecture" { return TK_LEMMA_CONJECTURE ; }
"lemma" { return TK_LEMMA ; }
"negated_conjecture" { return TK_NEGATED_CONJECTURE ; }
"plain" { return TK_PLAIN ; }
"refutation" { return TK_REFUTATION ; }
"status" { return TK_STATUS ; }
"tautology" { return TK_TAUTOLOGY ; }
"theorem" { return TK_THEOREM ; }
"theory" { return TK_THEORY ; }
"unknown" { return TK_UNKNOWN ; }


"&" { return TK_AND ; }
"!=" { return TK_DIFF ; }
"=" { return TK_EQU ; }
"equal" { return TK_EQUAL ; }
"<=>" { return TK_EQV ; }
"$false" { return TK_FALSE ; }
"=>" { return TK_LIMPL ; }
"<~>" { return TK_NEQV ; }
"~&" { return TK_NAND ; }
"~|" { return TK_NOR ; }
"~" { return TK_NOT ; }
"|" { return TK_OR ; }
"<=" { return TK_RIMPL ; }
"$true" { return TK_TRUE ; }
"!" { return TK_BANG ; }
"?" { return TK_QUESTION_MARK ; }
"+" { return TK_PLUS ; }
"-" { return TK_MINUS ; }


":" { return TK_COLON ; }
"," { return TK_COMMA ; }
"." { return TK_DOT ; }
"(" { return TK_LP ; }
")" { return TK_RP ; }
"[" { return TK_LSQ ; }
"]" { return TK_RSQ ; }


{upper_word} { return UPPER_WORD ; }
{lower_word} { return LOWER_WORD ; }
{single_quoted} { return SINGLE_QUOTED ; }
{double_quoted} { return DOUBLE_QUOTED ; }
{unsigned_integer} { return UNSIGNED_INTEGER ; }
{real} { return REAL ; }
{percent_comment} { printf("%i\n", ++tstp_lineno); return PERCENT_COMMENT ; }


"\n" { printf("%i\n", ++tstp_lineno); }


[ \t]+ /* eat up space */
  /*. { tstp_error("unrecognized character"); }*/
%%
/* user code */


/* tstp_clean frees the resources allocated by the scanner */
void
tstp_clean()
{
}


/* scanner ends */


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.