Re: parser performance, was Popularity of compiler tools, was LRgen

"Derek M. Jones" <derek@knosof.co.uk>
Sat, 12 Apr 2008 23:02:28 GMT

          From comp.compilers

Related articles
Re: Seeking recommendations for a Visual Parser to replace Visual Pars mjfs1@cam.ac.uk (Marcel Satchell) (2008-03-28)
Re: LRgen, was Seeking recommendations for a Visual Parser to replace paul@paulbmann.com (Paul B Mann) (2008-03-31)
Popularity of compiler tools, was LRgen anton@mips.complang.tuwien.ac.at (2008-04-06)
Re: Popularity of compiler tools, was LRgen wclodius@los-alamos.net (2008-04-11)
Re: parser performance, was Popularity of compiler tools, was LRgen ian@airs.com (Ian Lance Taylor) (2008-04-12)
Re: parser performance, was Popularity of compiler tools, was LRgen ian@airs.com (Ian Lance Taylor) (2008-04-12)
Re: parser performance, was Popularity of compiler tools, was LRgen derek@knosof.co.uk (Derek M. Jones) (2008-04-12)
| List of all articles for this month |

From: "Derek M. Jones" <derek@knosof.co.uk>
Newsgroups: comp.compilers
Date: Sat, 12 Apr 2008 23:02:28 GMT
Organization: ntl Cablemodem News Service
References: 08-03-107 08-03-119 08-04-024 08-04-046 08-04-047 08-04-048
Keywords: lex, parse, performance
Posted-Date: 12 Apr 2008 21:38:09 EDT

Ian,


> The 50% does include tokenizing, but still parsing is more than half
> of that. C++ files can easily include hundreds of thousands of lines
> of header files, and gcc parses all of them even though very few of
> them will be used by anything after the parsing stage.


About the only published paper on the subject is:
W. M. Waite.
The cost of lexical analysis.
Software-Practice and Experience, 16(5):473-488, 1986.


This quotes 50% of compilation time spent in tokenizing


> I should add that I agree with your main point that there probably
> isn't much speed difference between a hand-written parser and a
> generated parser. I just want to make the point that parsing speed
> actually is relevant in practice.


Waite quotes a factor of 3-5 difference between hand written and
automatic lexers. Perhaps the quality of automaticaly generated
lexers has improved (I have no idea)?


Given the amount of effort that has historically gone into decreasing
the size of parser tables, even to the extent of adding extra
indirection (that saved memory at the cost of longer parse times), we
might reasonably assume that parse times are not significant.


Of course now that Bison supports GLR parsers (for the last few years),
there is now lots of opportunity for people to create parsers that
consume large amounts of time and memory ;-)


> gcc now uses a hand-written parser for C too, by the way.


It has for some time.


[Re parse times, remember that yacc et al were born on systems that
gave you a total of 64K per process, so simply squeezing the compiler
into the address space was a challenge, and a few extra table lookups
was a lot better than breaking the compiler into multiple phases. Re
automatic lexers, if he was using the old Bell Labs lex, I'm not
surprised since it was a dreadful piece of code. (Not so dreadful as
to keep the programmer from becoming the CEO of Google, but dreadful
nonetheless.) Flex is much faster. -John]



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.