Re: flex scanner too huge, suggestions?

"Scott Nicol" <>
23 Feb 2001 00:01:13 -0500

          From comp.compilers

Related articles
flex scanner too huge, suggestions? (Troy Cauble) (2001-02-17)
Re: flex scanner too huge, suggestions? (Tim Josling) (2001-02-23)
Re: flex scanner too huge, suggestions? (Scott Nicol) (2001-02-23)
Re: flex scanner too huge, suggestions? (2001-02-23)
Re: flex scanner too huge, suggestions? (2001-02-25)
Re: flex scanner too huge, suggestions? (Ron Pinkas) (2001-02-25)
Re: flex scanner too huge, suggestions? (Troy Cauble) (2001-03-01)
| List of all articles for this month |

From: "Scott Nicol" <>
Newsgroups: comp.compilers
Date: 23 Feb 2001 00:01:13 -0500
Organization: APK Net
References: 01-02-097
Keywords: lex
Posted-Date: 23 Feb 2001 00:01:13 EST

"Troy Cauble" <> wrote in message
> Made with LFLAGS = -Cfe, size gives
> 42068(.text) + 24(.data) + 60(.bss) + 1687886(.rodata) = 1730038

-Cfe is no table compression, not much use except for debugging flex.

> Made with LFLAGS = -Cem, size gives
> 42496(.text) + 24(.data) + 60(.bss) + 508326(.rodata) = 550906

OK, so you have very large tables.

> I haven't tried to disable features to determine what makes
> it so huge, but the protocol is case-insensitive and context
> (position) sensitive. I addressed the context sensitivity
> with flex start states.
> My flex file has the following %option line.
> %option noyywrap 8bit batch case-insensitive perf-report

None of these options will cause table explosion. case-insensitive
will shrink tables somewhat.

How many start states does your scanner have? A few won't make much
of a difference, but start states do chew up table space. If you have
lots of start states, this could be the problem.

> I'm looking for suggestions for either tweaking my flex
> implementation or switching to another scanner generator.

Try to reduce the number of states by reducing the number of rules.
For example, if you have a lot of keywords, each with a distinct rule,
it would make your tables smaller if you could combine them into one
rule. If your scanner looks like this:

        const { return CONST; }
        else { return ELSE; }
        if { return IF; }
        ... 500 keywords deleted
        while { return WHILE; }
        [a-z][a-z0-9_]* { yylval.symbol = lookup(yytext); return

you could shrink your tables by placing all of the keywords into the
IDENTIFIER rule. To do this, you would have to load a symbol table with all
of the keywords. Your lex file would then look like this:

        [a-z][a-z0-9_]* { yylval.symbol = lookup(yytext); return yylval.token; }

> The corresponding bison parser is solid, so I'm not really looking
> for suggestions to replace the pair. (We initially tried several
> other tools for the total parsing problem but they all choked on
> this large, context sensitive protocol.)

flex is about as good as you'll get. It generates very small tables.
Another alternative is to write your own yylex(), by hand. It isn't
usually that difficult, and you may be able to do some things by hand
much more efficiently than you can do in flex. Lots of start states
is generally a sign that your trying to force things.

Scott Nicol

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.