Re: Q:How to Speedup yacc based parsers?

Alexander Krotoff <>
16 Mar 1997 23:35:50 -0500

          From comp.compilers

Related articles
Q:How to Speedup yacc based parsers? (Venkateswara Rao) (1997-03-07)
Re: Q:How to Speedup yacc based parsers? john_reiser@MENTORG.COM (1997-03-14)
Re: Q:How to Speedup yacc based parsers? (1997-03-16)
Re: Q:How to Speedup yacc based parsers? (John Lilley) (1997-03-16)
Re: Q:How to Speedup yacc based parsers? (Alexander Krotoff) (1997-03-16)
Re: Q:How to Speedup yacc based parsers? (Scott Stanchfield) (1997-03-18)
Re: Q:How to Speedup yacc based parsers? (Carl Cerecke) (1997-03-21)
| List of all articles for this month |

From: Alexander Krotoff <>
Newsgroups: comp.compilers
Date: 16 Mar 1997 23:35:50 -0500
Organization: Research Computer Center, Moscow State University
Keywords: parse, performance, yacc

> Venkateswara Rao ( wrote:
> : I am working on a large parser (yacc based). I was wondering whether I
> : can get some tips from the old hands in the group to improve on speed.
> : I am using Abraxas PCYACC and Visual C++ 4.1 on NT4.0, if it matters.

> If the runtime interpreter of PCYACC is similar to that of byacc or
> the original yacc by S.C. Johnson of Bell Labs, then the cache miss
> ratio during yyparse() can be large. It may be advantageous to inter-
> leave some of the parallel linear arrays into one or more linear
> arrays of structs. It may be beneficial to massage the interpreter
> slightly, to streamline the critical path, and to unify special cases.

> The integer coding of tokens matters. If the grammar has any quoted
> ASCII characters in its productions, then there will be an unused gap
> from 127 to 257 in the token numbering. So, don't use quoted ASCII
> characters; use names such as T_COMMA, etc (of course, the lexer must
> co-operate). Then subtract 256 from each token number. Sometimes
> this can make token numbers fit into a single eight-bit byte instead
> of a 16-bit short.

> The largest linear array in many yacc-derived parsers is really a
> two-dimensional matrix whose rows have been stripped of leading and
> trailing zeroes, then packed into a linear array. Rows of the matrix
> are overlapped where possible, with zero-valued entries treated as
> "don't care". Again, coding matters, particularly for cache
> efficiency. Build a quick-and-dirty graphical development tool which
> displays the matrix in its two-dimensional form, showing the zero and
> non-zero entries. A few hours of interactive pairwise swapping of
> token codes, and otherwise making denser ranges for the "important"
> rows, will shrink the large array and reduce cache misses. A good
> goal is to shrink the matrix so that it fits into on-chip cache, or
> has low page fragmentation, etc.

One more hint:
If the language has compicated expression syntax (like C)
it is more efficient to express operators precedence explicitly
by %prec directiove, instead of expression syntax tricks like
it is done in the ISO C and C++ Draft Standatd (where operators
precedence is expressed implicitly).

It is significanly reduces number of reduces (by mean of LALR
algorithm) while parsing.

For example:
[bad style]
AdditiveExpression AddOperator MultiplicativeExpression |

lxmPlus |

MultiplicativeExpression MulOperator HighPrecExpression |

lxmMul |

[better style]
HighPrecExpression |
lxmLeftPar Expression lxmRightPar |
Expression lxmPlus Expression %prec pseudoAdditive |
Expression lxmMinus Expression %prec pseudoAdditive |
Expression lxmMul Expression %prec pseudoMultiplicative |
Expression lxmDiv Expression %prec pseudoMultiplicative;

Alexander N. Krotoff
Research Computer Center tel: +7(095)939-2638
Moscow State University fax: +7(095)939-4430

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.