|Q:How to Speedup yacc based parsers? Venkateswara.Rao@cho.ge.com (Venkateswara Rao) (1997-03-07)|
|Re: Q:How to Speedup yacc based parsers? john_reiser@MENTORG.COM (1997-03-14)|
|Re: Q:How to Speedup yacc based parsers? firstname.lastname@example.org (1997-03-16)|
|Re: Q:How to Speedup yacc based parsers? email@example.com (John Lilley) (1997-03-16)|
|Re: Q:How to Speedup yacc based parsers? firstname.lastname@example.org (Alexander Krotoff) (1997-03-16)|
|Re: Q:How to Speedup yacc based parsers? email@example.com (Scott Stanchfield) (1997-03-18)|
|Re: Q:How to Speedup yacc based parsers? firstname.lastname@example.org (Carl Cerecke) (1997-03-21)|
|From:||Alexander Krotoff <email@example.com>|
|Date:||16 Mar 1997 23:35:50 -0500|
|Organization:||Research Computer Center, Moscow State University|
|Keywords:||parse, performance, yacc|
> Venkateswara Rao (Venkateswara.Rao@cho.ge.com) wrote:
> : I am working on a large parser (yacc based). I was wondering whether I
> : can get some tips from the old hands in the group to improve on speed.
> : I am using Abraxas PCYACC and Visual C++ 4.1 on NT4.0, if it matters.
> If the runtime interpreter of PCYACC is similar to that of byacc or
> the original yacc by S.C. Johnson of Bell Labs, then the cache miss
> ratio during yyparse() can be large. It may be advantageous to inter-
> leave some of the parallel linear arrays into one or more linear
> arrays of structs. It may be beneficial to massage the interpreter
> slightly, to streamline the critical path, and to unify special cases.
> The integer coding of tokens matters. If the grammar has any quoted
> ASCII characters in its productions, then there will be an unused gap
> from 127 to 257 in the token numbering. So, don't use quoted ASCII
> characters; use names such as T_COMMA, etc (of course, the lexer must
> co-operate). Then subtract 256 from each token number. Sometimes
> this can make token numbers fit into a single eight-bit byte instead
> of a 16-bit short.
> The largest linear array in many yacc-derived parsers is really a
> two-dimensional matrix whose rows have been stripped of leading and
> trailing zeroes, then packed into a linear array. Rows of the matrix
> are overlapped where possible, with zero-valued entries treated as
> "don't care". Again, coding matters, particularly for cache
> efficiency. Build a quick-and-dirty graphical development tool which
> displays the matrix in its two-dimensional form, showing the zero and
> non-zero entries. A few hours of interactive pairwise swapping of
> token codes, and otherwise making denser ranges for the "important"
> rows, will shrink the large array and reduce cache misses. A good
> goal is to shrink the matrix so that it fits into on-chip cache, or
> has low page fragmentation, etc.
One more hint:
If the language has compicated expression syntax (like C)
it is more efficient to express operators precedence explicitly
by %prec directiove, instead of expression syntax tricks like
it is done in the ISO C and C++ Draft Standatd (where operators
precedence is expressed implicitly).
It is significanly reduces number of reduces (by mean of LALR
algorithm) while parsing.
AdditiveExpression AddOperator MultiplicativeExpression |
MultiplicativeExpression MulOperator HighPrecExpression |
lxmLeftPar Expression lxmRightPar |
Expression lxmPlus Expression %prec pseudoAdditive |
Expression lxmMinus Expression %prec pseudoAdditive |
Expression lxmMul Expression %prec pseudoMultiplicative |
Expression lxmDiv Expression %prec pseudoMultiplicative;
Alexander N. Krotoff firstname.lastname@example.org
Research Computer Center tel: +7(095)939-2638
Moscow State University fax: +7(095)939-4430
Return to the
Search the comp.compilers archives again.