Re: lexer speed, was Bison

Hans-Peter Diettrich <DrDiettrich1@aol.com>
Tue, 21 Aug 2012 07:40:25 +0100

          From comp.compilers

Related articles
=?UTF-8?Q?Bison_determinis=E2=80=8Btic_LALR=281=29_parser_for_Java=2FC hsad005@gmail.com (2012-08-17)
Re: Bison =?UTF-8?B?ZGV0ZXJtaW5pc+KAi3RpYyBMQUxSKDEpIHBhcnNlciBm?= =?U DrDiettrich1@aol.com (Hans-Peter Diettrich) (2012-08-18)
Re: lexer speed, was Bison DrDiettrich1@aol.com (Hans-Peter Diettrich) (2012-08-20)
Re: lexer speed, was Bison DrDiettrich1@aol.com (Hans-Peter Diettrich) (2012-08-20)
Re: lexer speed, was Bison cr88192@hotmail.com (BGB) (2012-08-20)
Re: lexer speed, was Bison DrDiettrich1@aol.com (Hans-Peter Diettrich) (2012-08-21)
Re: lexer speed, was Bison bc@freeuk.com (BartC) (2012-08-21)
| List of all articles for this month |

From: Hans-Peter Diettrich <DrDiettrich1@aol.com>
Newsgroups: comp.compilers
Date: Tue, 21 Aug 2012 07:40:25 +0100
Organization: Compilers Central
References: 12-08-005 12-08-006 12-08-008 12-08-011
Keywords: performance
Posted-Date: 21 Aug 2012 20:57:19 EDT

BGB schrieb:


> 1: actually, my C parser uses a pretty big hash for type-name lookup,
> namely 16k entry IIRC, whereas for most other things 1024 or 4096 entry
> is plenty sufficient (for a chain-hash).


The global (top level) symbol table can become very big, in detail in C
compilers. A separate table for type names is not required.


> the main reason for this is the
> large number of typedefs in headers like "windows.h", which can easily
> saturate a 1024 or 4096 entry hash.


Right. AFAIR I added capabilities to skip already processed header
files, at least when a guard is detected. Such (not fully conforming)
behaviour can influence the time spent with loading the same files
moreoften.


> I have used bigger hash tables though, namely for things like interning
> strings (64k) and dynamic+interface method-dispatch (256k, but this one
> is an open-address hash).
>
>
>> DoDi
>> [The benchmarks I did were a while ago, but they showed a large
>> fraction of time in the lexer. I wouldn't disagree that building the
>> symbol table is slow, but figure out some estimate of the ratio of
>> the number of characters in a source file to the number of tokens,
>> and that is a rough estimate of how much slower the lexer will be
>> than the parser. I agree that some current analyses would be useful.
>> -John]
>>
>
> yep.
>
> can't compare exactly, as my parsers tend to be recursive-descent and
> build ASTs directly.


The same for my compilers. My C to Pascal converter uses an LL(1)
parser, with only one exception where LL(2) lookahead is required.
Expressions are parsed differently, using an table-driven parser for
operator precedence and associativity. The preprocessor is implemented
in a couple of filter stages between the lexer and parser.




P.S.: When the time for loading source files can be separated from other
lexer tasks, I'd expect that this operation takes most of the time.


DoDi


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.