Thu, 14 Feb 2008 17:28:55 -0500

Related articles |
---|

[2 earlier articles] |

Re: Full LR(1) parser generator Hyacc 0.9 release txchen@gmail.com (Thomas Chen) (2008-02-07) |

Re: Full LR(1) parser generator Hyacc 0.9 release DrDiettrich1@aol.com (Hans-Peter Diettrich) (2008-02-11) |

Re: Full LR(1) parser generator Hyacc 0.9 release haberg_20080207@math.su.se (Hans Aberg) (2008-02-11) |

Re: Full LR(1) parser generator Hyacc 0.9 release txchen@gmail.com (Thomas Chen) (2008-02-12) |

Re: Full LR(1) parser generator Hyacc 0.9 release haberg_20080207@math.su.se (Hans Aberg) (2008-02-13) |

Re: Full LR(1) parser generator Hyacc 0.9 release paul@paulbmann.com (Paul B Mann) (2008-02-14) |

Re: Full LR(1) parser generator Hyacc 0.9 release cfc@shell01.TheWorld.com (Chris F Clark) (2008-02-14) |

Re: Full LR(1) parser generator Hyacc 0.9 release jdenny@ces.clemson.edu (Joel E. Denny) (2008-02-23) |

Re: Full LR(1) parser generator Hyacc 0.9 release cfc@TheWorld.com (Chris F Clark) (2008-02-24) |

Re: Full LR(1) parser generator Hyacc 0.9 release jdenny@ces.clemson.edu (Joel E. Denny) (2008-02-25) |

Re: Full LR(1) parser generator Hyacc 0.9 release paul@paulbmann.com (Paul B Mann) (2008-02-26) |

Re: Full LR(1) parser generator Hyacc 0.9 release cfc@TheWorld.com (Chris F Clark) (2008-02-27) |

Re: Full LR(1) parser generator Hyacc 0.9 release jdenny@ces.clemson.edu (Joel E. Denny) (2008-02-27) |

[5 later articles] |

From: | Chris F Clark <cfc@shell01.TheWorld.com> |

Newsgroups: | comp.compilers |

Date: | Thu, 14 Feb 2008 17:28:55 -0500 |

Organization: | The World Public Access UNIX, Brookline, MA |

References: | 08-02-019 08-02-022 08-02-030 08-02-037 08-02-040 08-02-041 |

Keywords: | tools, parse |

Posted-Date: | 14 Feb 2008 21:19:46 EST |

Hans Aberg <haberg_20080207@math.su.se> writes:

*> Thomas Chen wrote:*

*>*

*>> Token precedence is a way of handling syntax ambiguity.*

*>*

*> John Levine wrote:*

*>*

*> > [Precedence is not a way of handling ambiguity, it's a way of writing*

*> > a grammar more concisely. Anything you can write with precedence in*

*> > an LR(1) or LALR grammar you could also write by adding more rules, but*

*> > the version using precedence is shorter and easier to follow. -John]*

*>*

*> Though I haven't seen a formal proof of the latter, I agree that is the*

*> gist of it.*

The original paper on precedence was "Deterministic Parsing of

Ambiguous Grammars" by Aho, Johnson, & Ullman, so in that sense Thomas

is right. However, the resulting language (the language that the

parser actually accepts) is unambiguous, and it can be proven that

every such grammar has an equivalent LR(1) grammar, so in that sense

John and Hans are right. However, it is worth noting that the

resulting LR-parsing tables for the language implemented using

precedence operations is usually smaller and never larger than the

equivalent LR(1) grammar with no precedence. In particular, the

precedence language will have fewer (or at least no more)

non-terminals than the non-precedence version (and also fewer or at

least no more rules). Thus, in my opinion, precedence is about ease

(terseness) of grammar writing and small table generation, as Hans

states here.

*> So one is programing, and as indicated above, using precedences help to*

*> structure the code. So it would be useful even if it does not enlarge*

*> the language class. Not having it, might be frustrating.*

Next point:

*>> Hyacc does support token precedence the same way that Yacc and Bison*

*>> does. As I said, this is independent from the LR(1) or LALR(1) algorithms.*

Yes, once one has LR-tables (and there are essentially three forms of

them, which I will get to in a minute), adding the precedences in is

algorithm independent. I don't see what the issue is here. If you

add precedences to the resulting LR tables, then you have the same

result (within a table family) no matter how you derived the original

tables.

The three forms (families) of LR tables are the LR(0) tables,

cannonical LR(1) tables, and split LR(1) tables.

Almost all the LR algorithms except for canonical LR(1) (and state

splitting LR(1) algorithms) (i.e. anything based on Frank Deremer's

work) use the LR(0) tables. Thus, if you have an LR(0), SLR(1), or

LALR(1) (or even NQ-LALR(1)) parser generator, those all use LR(0)

tables--that is the number of states (and shift actions) in the tables

are the same. The only difference is how the reduce actions are added

to the table.

The canonical LR(1) algorithm has its own set of tables, because it

fully distinguishes states based upon lookaheads implied by

left-context, which the LR(0) tables do not and simply discard. Thus,

it is simple to see that each LR(0) state corresponds to a set of

canonical LR(1) states, the canonical LR states being distinguished by

the left-context and the LR(0) states treating that as one large

equivalence class.

In between those two extremes are the split LR algorithm states. These

algorithms note that not every state distinction made in the canonical

LR algorithm has an effect on the shift-reduce or reduce-reduce

conflicts. Thus, you don't need to distinguish those states, you can

use the LR(0) states in those cases. You only need the canonical LR

states in the cases where the look-ahead implied by the left-context

causes the reduction actions to vary. And, like most of these

algorithms you can find the fixed-point of the set working from the

larger set (the canonical LR states) and taking unions or the smaller

set (the LR(0) states) and taking intersections (splitting states).

The fixed points you find vary slightly and I personally prefer David

Spector's method, because you always get a smaller set by starting

with the LR(0) states and splitting. Although I may be wrong, Pager's

algorithm reminds me of the other fixed point finding method, of

starting with the safe canonical LR(1) tables and merging when it is

known to be safe.

However, the key point is that if you take the state splitting far

enough, you can form a complete tree of LR parsing tables, from the

LR(0) tables to the canonical LR(1) tables--all the LR table

algorithms that I know of create a member of that family of tables.

Moreover, when you apply the precendence rules, they apply to that

family of tables in the same way. That is, if you use precedence to

resolve a shift-reduce conflict in some LR(0) state, there is a

precise set of states in each of the family members you need to check

to see if it applies, and it will have an equivalent effect on each of

the sets of LR tables in that family.

The only exception being that if the left-context of the rules in

conflict in the LR(0) tables can resolve the conflict without use of

precedence, there will be no conflict in the canonical LR(1) tables

(or any of the sets of split tables that have applied a split to

resolve that conflict). However, this effect is very rare and of

negligable impact. Most grammars using precendence do not have

shift-reduce conflicts that can be resolved by state splitting--in

fact (if I recall correctly), state splitting generally resolves

reduce-reduce rather than shift-reduce conflicts.

*> I made a method where precedences can be implemented by constraining the*

*> expansions in the rules. Then I know that such a grammar with restraints*

*> can be rewritten into a CFG. So by that I know, it is a properly of the*

*> grammar alone and not the parsing algorithm.*

Reference to web page or paper please. I may have read it already,

but in case I haven't I want to.

*> I think that the Bison LALR(1) uses optimizations beyond what is*

*> mentioned in the Aho, Sethi & Ullman book. Also, one point is that it*

*> does not need to compute any LR(1) states, but only SLR(0) or something,*

*> so that the computations are linear, not exponential.*

This is true of all LALR(1) algorithms that I know of, they all use

just the LR(0) aka SLR(0) states and none of them compute the

canonical LR(1) states at any point. Only the state splitting

algorithms use the canonical LR(1) states.

That being said, I have never read the source to any version of Bison

or Yacc (other than Yacc++), so I cannot comment further on what they

do. There may be optimizations in Bison's algorithm that make certain

aspects more complex.

Hope this helps,

-Chris

******************************************************************************

Chris Clark Internet: christopher.f.clark@compiler-resources.com

Compiler Resources, Inc. or: compres@world.std.com

23 Bailey Rd Web Site: http://world.std.com/~compres

Berlin, MA 01503 voice: (508) 435-5016

USA fax: (978) 838-0263 (24 hours)

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.