Re: Why are LR parsers faster than using if conditions

tmoog@panix.com (Tom Moog)
30 Jun 2004 23:07:16 -0400

From comp.compilers

Related articles
[3 earlier articles]
Re: Why are LR parsers faster than using if conditions cdc@maxnet.co.nz (Carl Cerecke) (2004-06-15)
Re: Why are LR parsers faster than using if conditions cdc@maxnet.co.nz (Carl Cerecke) (2004-06-21)
Re: Why are LR parsers faster than using if conditions t.zielonka@zodiac.mimuw.edu.pl (Tomasz Zielonka) (2004-06-25)
Re: Why are LR parsers faster than using if conditions haberg@matematik.su.se (Hans Aberg) (2004-06-26)
Re: Why are LR parsers faster than using if conditions haberg@matematik.su.se (Hans Aberg) (2004-06-28)
Re: Why are LR parsers faster than using if conditions clint@0lsen.net (Clint Olsen) (2004-06-28)
*Re: Why are LR parsers faster than using if conditions tmoog@panix.com* (2004-06-30)**
Re: Why are LR parsers faster than using if conditions haberg@matematik.su.se (Hans Aberg) (2004-06-30)
Re: Why are LR parsers faster than using if conditions christian.bau@cbau.freeserve.co.uk (Christian Bau) (2004-07-11)
Re: Why are LR parsers faster than using if conditions try_vanevery_at_mycompanyname@yahoo.com (Brandon J. Van Every) (2004-07-13)
Re: Why are LR parsers faster than using if conditions paulbmann@yahoo.com (Paul B Mann) (2004-07-28)

| List of all articles for this month |

From:	tmoog@panix.com (Tom Moog)
Newsgroups:	comp.compilers
Date:	30 Jun 2004 23:07:16 -0400
Organization:	Public Access Networks Corp.
References:	04-06-012 04-06-072 04-06-098 04-06-122
Keywords:	parse, performance
Posted-Date:	30 Jun 2004 23:07:16 EDT

If you have a large token and the flex work area is increased in size
too slowly when it needs to be expanded, you can end up with quadratic
behavior. The problem is that flex will start scanning the token from
the beginning each time it runs off the end of the buffer.

Clint Olsen <clint@0lsen.net> wrote:
>On 2004-06-27, Hans Aberg <haberg@matematik.su.se> wrote:
>>
>> There have been similar discussions in the Flex mailing list as to
>> whether certain features slow down the lexing. .... One thing that did
>> matter though, was whether the lexer rules matches long input segment, in
>> which case the lexer would sort of choke and become very slow. This is
>> clearly some kind of bug, say a buffer problem. But until it has been
>> fixed, one has in Flex to break up the lexer rules so that the typical
>> input matches do not become too large.
>
>I'll have to second John's assertion that lexing is the most expensive
>phase of parsing. I've profiled quite a few pieces of software that
>have used yacc, bison, lemon, and Parse::Yapp for parsing and lex,
>flex, and re2c for lexing, and the results have been pretty much the
>same.
>
>However, your comment about matching long input doesn't jive with the
>flex documentation. In fact, you usually *want* to match the longest
>input possible to spend as little time as possible doing work per
>character of input. The flex documentation goes to great lengths to
>show how refactoring can affect performance in a positive way at the
>expense of some more verbose and seemingly redundant regular
>expressions. The LOPLAS article on re2c also suggests this technique.
>It can be likened to loop unrolling in the regular-expression domain.
>
>The Dragon Book characterizes the lexing effect quite succinctly.
>Lexing is one of the few phases where every character of input must be
>examined (obviously some more than others), so it makes sense to
>minimize the number of operations you do per character.
>
>I do seem to recall that the flex developers have dealt with the
>historical problem of keeping track of yylineno affecting lexer
>performance. The long input match problem you mentioned seems to
>indicate a problem with their input buffering mechanism.

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Why are LR parsers faster than using if conditions

tmoog@panix.com (Tom Moog)30 Jun 2004 23:07:16 -0400

tmoog@panix.com (Tom Moog)
30 Jun 2004 23:07:16 -0400