Re: lexer speed, was Bison

Hans-Peter Diettrich <>
Mon, 20 Aug 2012 01:01:53 +0100

          From comp.compilers

> [Compilers spend a lot of time in the lexer, because that's the only
> phase that has to look at the input one character at a time. -John]

When the source code resides in a memory buffer, the time for reading
e.g. the characters of an identifier (in the lexer) is neglectable vs.
the time spent in lookup and entering the identifier into a symbol table
(in the parser).

Even if a lexer reads single characters from a file, most OSs maintain
their own file buffer, so that little overhead is added over the
program-buffered solution.

I really would like to see some current benchmarks about the behaviour
of current compilers and systems.

[The benchmarks I did were a while ago, but they showed a large
fraction of time in the lexer. I wouldn't disagree that building the
symbol table is slow, but figure out some estimate of the ratio of
the number of characters in a source file to the number of tokens,
and that is a rough estimate of how much slower the lexer will be
than the parser. I agree that some current analyses would be useful.

