Re: Why are LR parsers faster than using if conditions

Clint Olsen <clint@0lsen.net>
28 Jun 2004 20:06:28 -0400

          From comp.compilers

Related articles
[2 earlier articles]
Re: Why are LR parsers faster than using if conditions alexc@std.com (Alex Colvin) (2004-06-11)
Re: Why are LR parsers faster than using if conditions cdc@maxnet.co.nz (Carl Cerecke) (2004-06-15)
Re: Why are LR parsers faster than using if conditions cdc@maxnet.co.nz (Carl Cerecke) (2004-06-21)
Re: Why are LR parsers faster than using if conditions t.zielonka@zodiac.mimuw.edu.pl (Tomasz Zielonka) (2004-06-25)
Re: Why are LR parsers faster than using if conditions haberg@matematik.su.se (Hans Aberg) (2004-06-26)
Re: Why are LR parsers faster than using if conditions haberg@matematik.su.se (Hans Aberg) (2004-06-28)
Re: Why are LR parsers faster than using if conditions clint@0lsen.net (Clint Olsen) (2004-06-28)
RE: Why are LR parsers faster than using if conditions quinn-j@shaw.ca (Quinn Tyler Jackson) (2004-06-30)
Re: Why are LR parsers faster than using if conditions tmoog@panix.com (2004-06-30)
Re: Why are LR parsers faster than using if conditions haberg@matematik.su.se (Hans Aberg) (2004-06-30)
Re: Why are LR parsers faster than using if conditions christian.bau@cbau.freeserve.co.uk (Christian Bau) (2004-07-11)
Re: Why are LR parsers faster than using if conditions try_vanevery_at_mycompanyname@yahoo.com (Brandon J. Van Every) (2004-07-13)
Re: Why are LR parsers faster than using if conditions paulbmann@yahoo.com (Paul B Mann) (2004-07-28)
| List of all articles for this month |

From: Clint Olsen <clint@0lsen.net>
Newsgroups: comp.compilers
Date: 28 Jun 2004 20:06:28 -0400
Organization: Comcast Online
References: 04-06-012 04-06-034 04-06-072 04-06-098
Keywords: parse, performance
Posted-Date: 28 Jun 2004 20:06:28 EDT

On 2004-06-27, Hans Aberg <haberg@matematik.su.se> wrote:
>
> There have been similar discussions in the Flex mailing list as to
> whether certain features slow down the lexing. For example, instead of
> using the built in location tracking mechanism, some insisted that one
> should write ones own location tracking by tweaking the Flex grammar
> rules. When I finally tried both variations out, it turned out that the
> lexing time did not depend significantly at all. One thing that did
> matter though, was whether the lexer rules matches long input segment, in
> which case the lexer would sort of choke and become very slow. This is
> clearly some kind of bug, say a buffer problem. But until it has been
> fixed, one has in Flex to break up the lexer rules so that the typical
> input matches do not become too large.


I'll have to second John's assertion that lexing is the most expensive
phase of parsing. I've profiled quite a few pieces of software that
have used yacc, bison, lemon, and Parse::Yapp for parsing and lex,
flex, and re2c for lexing, and the results have been pretty much the
same.


However, your comment about matching long input doesn't jive with the
flex documentation. In fact, you usually *want* to match the longest
input possible to spend as little time as possible doing work per
character of input. The flex documentation goes to great lengths to
show how refactoring can affect performance in a positive way at the
expense of some more verbose and seemingly redundant regular
expressions. The LOPLAS article on re2c also suggests this technique.
It can be likened to loop unrolling in the regular-expression domain.


The Dragon Book characterizes the lexing effect quite succinctly.
Lexing is one of the few phases where every character of input must be
examined (obviously some more than others), so it makes sense to
minimize the number of operations you do per character.


I do seem to recall that the flex developers have dealt with the
historical problem of keeping track of yylineno affecting lexer
performance. The long input match problem you mentioned seems to
indicate a problem with their input buffering mechanism.


-Clint


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.