Re: lexer speed, was Bison

Hans-Peter Diettrich <>
Mon, 20 Aug 2012 16:14:38 +0100

          From comp.compilers

Related articles
=?UTF-8?Q?Bison_determinis=E2=80=8Btic_LALR=281=29_parser_for_Java=2FC (2012-08-17)
Re: Bison =?UTF-8?B?ZGV0ZXJtaW5pc+KAi3RpYyBMQUxSKDEpIHBhcnNlciBm?= =?U (Hans-Peter Diettrich) (2012-08-18)
Re: lexer speed, was Bison (Hans-Peter Diettrich) (2012-08-20)
Re: lexer speed, was Bison (Hans-Peter Diettrich) (2012-08-20)
Re: lexer speed, was Bison (BGB) (2012-08-20)
Re: lexer speed, was Bison (Hans-Peter Diettrich) (2012-08-21)
Re: lexer speed, was Bison (BartC) (2012-08-21)
| List of all articles for this month |

From: Hans-Peter Diettrich <>
Newsgroups: comp.compilers
Date: Mon, 20 Aug 2012 16:14:38 +0100
Organization: Compilers Central
References: 12-08-005 12-08-006 12-08-008
Keywords: parse, performance, comment

Hans-Peter Diettrich schrieb:
>> [Compilers spend a lot of time in the lexer, because that's the only
>> phase that has to look at the input one character at a time. -John]
> When the source code resides in a memory buffer, the time for reading
> e.g. the characters of an identifier (in the lexer) is neglectable vs.
> the time spent in lookup and entering the identifier into a symbol table
> (in the parser).
> Even if a lexer reads single characters from a file, most OSs maintain
> their own file buffer, so that little overhead is added over the
> program-buffered solution.
> I really would like to see some current benchmarks about the behaviour
> of current compilers and systems.
> DoDi
> [The benchmarks I did were a while ago, but they showed a large
> fraction of time in the lexer. I wouldn't disagree that building the
> symbol table is slow, but figure out some estimate of the ratio of
> the number of characters in a source file to the number of tokens,
> and that is a rough estimate of how much slower the lexer will be
> than the parser. I agree that some current analyses would be useful.
> -John]

Your estimation suggests that the benchmarked parser does not much
more than syntax checking the tokens. In this case I agree that a
parser automaton does less work (state transitions) than a lexer
automaton (or equivalents), and that detailed benchmarking is not
required to proof this fact.

I still disagree that "Compilers spend a lot of time in the lexer".
My point is that a real *compiler* does not normally spend significant
time in neither the lexer nor the parser[1]. In so far it's okay to
"profile your compiler to see where it's spending its time and fix
what needs to be fixed", as you said :-)

[1] The timing depends heavily on what tasks are assigned to the parser
itself. Is code generation part of the parser, or part of a subsequent
(optimization...) stage? What about symbol table (scopes) management and
lookup, and an eventual preprocessor which IMO in an C compiler consumes
more time than the lexer and parser altogether? But further discussion
of these academic issues doesn't help in spotting the real bottlenecks
of any concrete compiler ;-)

[Perhaps you could do some experiments and tell us whether it confirms
your intuition. Like I said, when I profiled some compilers, I was
surprised to see how much time they spent grinding through the
characters in the input. -John]

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.