Re: Q:How to Speedup yacc based parsers?

john_reiser@MENTORG.COM (John Reiser)
14 Mar 1997 00:35:22 -0500

          From comp.compilers

Related articles
Q:How to Speedup yacc based parsers? (Venkateswara Rao) (1997-03-07)
Re: Q:How to Speedup yacc based parsers? john_reiser@MENTORG.COM (1997-03-14)
Re: Q:How to Speedup yacc based parsers? (1997-03-16)
Re: Q:How to Speedup yacc based parsers? (John Lilley) (1997-03-16)
Re: Q:How to Speedup yacc based parsers? (Alexander Krotoff) (1997-03-16)
Re: Q:How to Speedup yacc based parsers? (Scott Stanchfield) (1997-03-18)
Re: Q:How to Speedup yacc based parsers? (Carl Cerecke) (1997-03-21)
| List of all articles for this month |

From: john_reiser@MENTORG.COM (John Reiser)
Newsgroups: comp.compilers
Date: 14 Mar 1997 00:35:22 -0500
Organization: Mentor Graphics Corporation
References: 97-03-036
Keywords: parse, performance

Venkateswara Rao ( wrote:
: I am working on a large parser (yacc based). I was wondering whether I
: can get some tips from the old hands in the group to improve on speed.
: I am using Abraxas PCYACC and Visual C++ 4.1 on NT4.0, if it matters.

: [I would do a little performance measurement before doing any tuning. You'll
: probably find that your lexer takes a lot more time than your parser. -John]

As the moderator suggests, do some performance measurement. It may be
particularly easy to compare PCYACC with GNU bison.

If the runtime interpreter of PCYACC is similar to that of byacc or
the original yacc by S.C. Johnson of Bell Labs, then the cache miss
ratio during yyparse() can be large. It may be advantageous to inter-
leave some of the parallel linear arrays into one or more linear
arrays of structs. It may be beneficial to massage the interpreter
slightly, to streamline the critical path, and to unify special cases.

The integer coding of tokens matters. If the grammar has any quoted
ASCII characters in its productions, then there will be an unused gap
from 127 to 257 in the token numbering. So, don't use quoted ASCII
characters; use names such as T_COMMA, etc (of course, the lexer must
co-operate). Then subtract 256 from each token number. Sometimes
this can make token numbers fit into a single eight-bit byte instead
of a 16-bit short.

The largest linear array in many yacc-derived parsers is really a
two-dimensional matrix whose rows have been stripped of leading and
trailing zeroes, then packed into a linear array. Rows of the matrix
are overlapped where possible, with zero-valued entries treated as
"don't care". Again, coding matters, particularly for cache
efficiency. Build a quick-and-dirty graphical development tool which
displays the matrix in its two-dimensional form, showing the zero and
non-zero entries. A few hours of interactive pairwise swapping of
token codes, and otherwise making denser ranges for the "important"
rows, will shrink the large array and reduce cache misses. A good
goal is to shrink the matrix so that it fits into on-chip cache, or
has low page fragmentation, etc.
John F. Reiser
Mentor Graphics Corporation
Wilsonville, Oregon 97070 USA

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.