From: | tmoog@panix.com (Tom Moog) |
Newsgroups: | comp.compilers |
Date: | 30 Jun 2004 23:07:16 -0400 |
Organization: | Public Access Networks Corp. |
References: | 04-06-012 04-06-072 04-06-098 04-06-122 |
Keywords: | parse, performance |
Posted-Date: | 30 Jun 2004 23:07:16 EDT |
If you have a large token and the flex work area is increased in size
too slowly when it needs to be expanded, you can end up with quadratic
behavior. The problem is that flex will start scanning the token from
the beginning each time it runs off the end of the buffer.
Clint Olsen <clint@0lsen.net> wrote:
>On 2004-06-27, Hans Aberg <haberg@matematik.su.se> wrote:
>>
>> There have been similar discussions in the Flex mailing list as to
>> whether certain features slow down the lexing. .... One thing that did
>> matter though, was whether the lexer rules matches long input segment, in
>> which case the lexer would sort of choke and become very slow. This is
>> clearly some kind of bug, say a buffer problem. But until it has been
>> fixed, one has in Flex to break up the lexer rules so that the typical
>> input matches do not become too large.
>
>I'll have to second John's assertion that lexing is the most expensive
>phase of parsing. I've profiled quite a few pieces of software that
>have used yacc, bison, lemon, and Parse::Yapp for parsing and lex,
>flex, and re2c for lexing, and the results have been pretty much the
>same.
>
>However, your comment about matching long input doesn't jive with the
>flex documentation. In fact, you usually *want* to match the longest
>input possible to spend as little time as possible doing work per
>character of input. The flex documentation goes to great lengths to
>show how refactoring can affect performance in a positive way at the
>expense of some more verbose and seemingly redundant regular
>expressions. The LOPLAS article on re2c also suggests this technique.
>It can be likened to loop unrolling in the regular-expression domain.
>
>The Dragon Book characterizes the lexing effect quite succinctly.
>Lexing is one of the few phases where every character of input must be
>examined (obviously some more than others), so it makes sense to
>minimize the number of operations you do per character.
>
>I do seem to recall that the flex developers have dealt with the
>historical problem of keeping track of yylineno affecting lexer
>performance. The long input match problem you mentioned seems to
>indicate a problem with their input buffering mechanism.
Return to the
comp.compilers page.
Search the
comp.compilers archives again.