Technical question about yacc and LALR parsers

compres! (Chris Clark)
21 Apr 91 02:04:33 GMT

          From comp.compilers

Related articles
Technical question about yacc and LALR parsers jar@florida.HQ.Ileaf.COMid AA07671; Sun, 21 Apr 91 (1991-04-20)
Re: Technical question about yacc and LALR parsers (1991-04-22)
Re: Technical question about yacc and LALR parsers (1991-04-22)
Re: Technical question about yacc and LALR parsers (1991-04-22)
Re: Technical question about yacc and LALR parsers (1991-04-23)
Technical question about yacc and LALR parsers compres! (1991-04-21)
| List of all articles for this month |

Newsgroups: comp.compilers
From: compres! (Chris Clark)
Keywords: yacc, parse, errors
Organization: Compilers Central
References: <9104222013.AA09131@florida.HQ.Ileaf.COM>
Date: 21 Apr 91 02:04:33 GMT

In Jim Roskind's previous posting, he describes a method for working back
through a grammar to determine the source of a conflict and an interesting
observation about the yacc states.

His interesting observation is the fact that a given state is always
entered on a transition on the same symbol. This means you'll never see a
state like the one below in a yacc verbose output.

state 727
parened_type : '(' VOID . ')'
parened_type : '(' INT . ')'

Several authors have offered proofs that this must be true. Jim's own
proof brings up the key reason, which involves the meaning of a production
and an item. Peter Garst properly labels this a tail property.

Here is yet another explanation of why the property holds and a way of
solving it by directly translating regular expression grammars.

In most yacc variants, each unique right-hand-side is considered a unique
production and is given a production number. Each symbol within a unique
right-hand-side is given a number within the production. An item is
simply an ordered pair (symbol number, production number). A yacc state
is simply a unique collection of items. However, because two unique
right-hand-size must have unique production numbers and thus unique items,
the two states cannot be merged by the (LALR) algorithm.

Fortunately, the "almost parsers" Jim wants are "easy" to create just by
changing the definition of an item.

For example, in Compiler Resources' Yacc++, we use regular expressions to
solve the same problem. Because of our direct translation algorithm, you
will often see states like the one below.

state 727
parened_type : '(' (VOID | INT) . ')'

This state will be reached on transitions of both symbols VOID and INT.

We cannot quantify the number of states saved, except that it's a lot,
perhaps as much as 50% for some grammars. The exact figures depend on how
many productions have suffixes in common. A few productions with long
common tails can truly increase the percentage. Other factors also come
into play, so your mileage may vary.

By the way, Jim's backtracking algorithm mirrors David Spector's algorithm
for splitting LALR states into LR ones--it was published in SIGPLAN
notices, Dec 1988, v 23 # 12, for those of you who are interested. By
walking back only to the merged states, one can perform LR disambiguation.
By walking back to the start state, one can determine the set of ambiguous
prefixes, which is what Jim wants.

Chris Clark & Barbara Zino
(508) 435-5016

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.