21 Apr 91 02:04:33 GMT

Related articles |
---|

Technical question about yacc and LALR parsers jar@florida.HQ.Ileaf.COMid AA07671; Sun, 21 Apr 91 (1991-04-20) |

Re: Technical question about yacc and LALR parsers jml@wally.altair.fr (1991-04-22) |

Re: Technical question about yacc and LALR parsers mauney@eos.ncsu.edu (1991-04-22) |

Re: Technical question about yacc and LALR parsers bliss@sp64.csrd.uiuc.edu (1991-04-22) |

Re: Technical question about yacc and LALR parsers boyland@aspen.berkeley.edu (1991-04-23) |

Technical question about yacc and LALR parsers compres!chris@crackers.clearpoint.com (1991-04-21) |

Newsgroups: | comp.compilers |

From: | compres!chris@crackers.clearpoint.com (Chris Clark) |

Keywords: | yacc, parse, errors |

Organization: | Compilers Central |

References: | <9104222013.AA09131@florida.HQ.Ileaf.COM> |

Date: | 21 Apr 91 02:04:33 GMT |

In Jim Roskind's previous posting, he describes a method for working back

through a grammar to determine the source of a conflict and an interesting

observation about the yacc states.

His interesting observation is the fact that a given state is always

entered on a transition on the same symbol. This means you'll never see a

state like the one below in a yacc verbose output.

state 727

parened_type : '(' VOID . ')'

parened_type : '(' INT . ')'

Several authors have offered proofs that this must be true. Jim's own

proof brings up the key reason, which involves the meaning of a production

and an item. Peter Garst properly labels this a tail property.

Here is yet another explanation of why the property holds and a way of

solving it by directly translating regular expression grammars.

In most yacc variants, each unique right-hand-side is considered a unique

production and is given a production number. Each symbol within a unique

right-hand-side is given a number within the production. An item is

simply an ordered pair (symbol number, production number). A yacc state

is simply a unique collection of items. However, because two unique

right-hand-size must have unique production numbers and thus unique items,

the two states cannot be merged by the (LALR) algorithm.

Fortunately, the "almost parsers" Jim wants are "easy" to create just by

changing the definition of an item.

For example, in Compiler Resources' Yacc++, we use regular expressions to

solve the same problem. Because of our direct translation algorithm, you

will often see states like the one below.

state 727

parened_type : '(' (VOID | INT) . ')'

This state will be reached on transitions of both symbols VOID and INT.

We cannot quantify the number of states saved, except that it's a lot,

perhaps as much as 50% for some grammars. The exact figures depend on how

many productions have suffixes in common. A few productions with long

common tails can truly increase the percentage. Other factors also come

into play, so your mileage may vary.

By the way, Jim's backtracking algorithm mirrors David Spector's algorithm

for splitting LALR states into LR ones--it was published in SIGPLAN

notices, Dec 1988, v 23 # 12, for those of you who are interested. By

walking back only to the merged states, one can perform LR disambiguation.

By walking back to the start state, one can determine the set of ambiguous

prefixes, which is what Jim wants.

Regards,

Chris Clark & Barbara Zino

(508) 435-5016

--

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.