From: | Clint Olsen <clint@0lsen.net> |
Newsgroups: | comp.compilers |
Date: | 28 Jun 2004 20:06:28 -0400 |
Organization: | Comcast Online |
References: | 04-06-012 04-06-034 04-06-072 04-06-098 |
Keywords: | parse, performance |
Posted-Date: | 28 Jun 2004 20:06:28 EDT |
On 2004-06-27, Hans Aberg <haberg@matematik.su.se> wrote:
>
> There have been similar discussions in the Flex mailing list as to
> whether certain features slow down the lexing. For example, instead of
> using the built in location tracking mechanism, some insisted that one
> should write ones own location tracking by tweaking the Flex grammar
> rules. When I finally tried both variations out, it turned out that the
> lexing time did not depend significantly at all. One thing that did
> matter though, was whether the lexer rules matches long input segment, in
> which case the lexer would sort of choke and become very slow. This is
> clearly some kind of bug, say a buffer problem. But until it has been
> fixed, one has in Flex to break up the lexer rules so that the typical
> input matches do not become too large.
I'll have to second John's assertion that lexing is the most expensive
phase of parsing. I've profiled quite a few pieces of software that
have used yacc, bison, lemon, and Parse::Yapp for parsing and lex,
flex, and re2c for lexing, and the results have been pretty much the
same.
However, your comment about matching long input doesn't jive with the
flex documentation. In fact, you usually *want* to match the longest
input possible to spend as little time as possible doing work per
character of input. The flex documentation goes to great lengths to
show how refactoring can affect performance in a positive way at the
expense of some more verbose and seemingly redundant regular
expressions. The LOPLAS article on re2c also suggests this technique.
It can be likened to loop unrolling in the regular-expression domain.
The Dragon Book characterizes the lexing effect quite succinctly.
Lexing is one of the few phases where every character of input must be
examined (obviously some more than others), so it makes sense to
minimize the number of operations you do per character.
I do seem to recall that the flex developers have dealt with the
historical problem of keeping track of yylineno affecting lexer
performance. The long input match problem you mentioned seems to
indicate a problem with their input buffering mechanism.
-Clint
Return to the
comp.compilers page.
Search the
comp.compilers archives again.