Re: Full LR(1) parser generator Hyacc 0.9 release

Chris F Clark <cfc@shell01.TheWorld.com>
Thu, 14 Feb 2008 17:28:55 -0500

          From comp.compilers

Related articles
[2 earlier articles]
Re: Full LR(1) parser generator Hyacc 0.9 release txchen@gmail.com (Thomas Chen) (2008-02-07)
Re: Full LR(1) parser generator Hyacc 0.9 release DrDiettrich1@aol.com (Hans-Peter Diettrich) (2008-02-11)
Re: Full LR(1) parser generator Hyacc 0.9 release haberg_20080207@math.su.se (Hans Aberg) (2008-02-11)
Re: Full LR(1) parser generator Hyacc 0.9 release txchen@gmail.com (Thomas Chen) (2008-02-12)
Re: Full LR(1) parser generator Hyacc 0.9 release haberg_20080207@math.su.se (Hans Aberg) (2008-02-13)
Re: Full LR(1) parser generator Hyacc 0.9 release paul@paulbmann.com (Paul B Mann) (2008-02-14)
Re: Full LR(1) parser generator Hyacc 0.9 release cfc@shell01.TheWorld.com (Chris F Clark) (2008-02-14)
Re: Full LR(1) parser generator Hyacc 0.9 release jdenny@ces.clemson.edu (Joel E. Denny) (2008-02-23)
Re: Full LR(1) parser generator Hyacc 0.9 release cfc@TheWorld.com (Chris F Clark) (2008-02-24)
Re: Full LR(1) parser generator Hyacc 0.9 release jdenny@ces.clemson.edu (Joel E. Denny) (2008-02-25)
Re: Full LR(1) parser generator Hyacc 0.9 release paul@paulbmann.com (Paul B Mann) (2008-02-26)
Re: Full LR(1) parser generator Hyacc 0.9 release cfc@TheWorld.com (Chris F Clark) (2008-02-27)
Re: Full LR(1) parser generator Hyacc 0.9 release jdenny@ces.clemson.edu (Joel E. Denny) (2008-02-27)
[5 later articles]
| List of all articles for this month |
From: Chris F Clark <cfc@shell01.TheWorld.com>
Newsgroups: comp.compilers
Date: Thu, 14 Feb 2008 17:28:55 -0500
Organization: The World Public Access UNIX, Brookline, MA
References: 08-02-019 08-02-022 08-02-030 08-02-037 08-02-040 08-02-041
Keywords: tools, parse
Posted-Date: 14 Feb 2008 21:19:46 EST

Hans Aberg <haberg_20080207@math.su.se> writes:


> Thomas Chen wrote:
>
>> Token precedence is a way of handling syntax ambiguity.
>
> John Levine wrote:
>
> > [Precedence is not a way of handling ambiguity, it's a way of writing
> > a grammar more concisely. Anything you can write with precedence in
> > an LR(1) or LALR grammar you could also write by adding more rules, but
> > the version using precedence is shorter and easier to follow. -John]
>
> Though I haven't seen a formal proof of the latter, I agree that is the
> gist of it.


The original paper on precedence was "Deterministic Parsing of
Ambiguous Grammars" by Aho, Johnson, & Ullman, so in that sense Thomas
is right. However, the resulting language (the language that the
parser actually accepts) is unambiguous, and it can be proven that
every such grammar has an equivalent LR(1) grammar, so in that sense
John and Hans are right. However, it is worth noting that the
resulting LR-parsing tables for the language implemented using
precedence operations is usually smaller and never larger than the
equivalent LR(1) grammar with no precedence. In particular, the
precedence language will have fewer (or at least no more)
non-terminals than the non-precedence version (and also fewer or at
least no more rules). Thus, in my opinion, precedence is about ease
(terseness) of grammar writing and small table generation, as Hans
states here.


> So one is programing, and as indicated above, using precedences help to
> structure the code. So it would be useful even if it does not enlarge
> the language class. Not having it, might be frustrating.


Next point:
>> Hyacc does support token precedence the same way that Yacc and Bison
>> does. As I said, this is independent from the LR(1) or LALR(1) algorithms.


Yes, once one has LR-tables (and there are essentially three forms of
them, which I will get to in a minute), adding the precedences in is
algorithm independent. I don't see what the issue is here. If you
add precedences to the resulting LR tables, then you have the same
result (within a table family) no matter how you derived the original
tables.


The three forms (families) of LR tables are the LR(0) tables,
cannonical LR(1) tables, and split LR(1) tables.


Almost all the LR algorithms except for canonical LR(1) (and state
splitting LR(1) algorithms) (i.e. anything based on Frank Deremer's
work) use the LR(0) tables. Thus, if you have an LR(0), SLR(1), or
LALR(1) (or even NQ-LALR(1)) parser generator, those all use LR(0)
tables--that is the number of states (and shift actions) in the tables
are the same. The only difference is how the reduce actions are added
to the table.


The canonical LR(1) algorithm has its own set of tables, because it
fully distinguishes states based upon lookaheads implied by
left-context, which the LR(0) tables do not and simply discard. Thus,
it is simple to see that each LR(0) state corresponds to a set of
canonical LR(1) states, the canonical LR states being distinguished by
the left-context and the LR(0) states treating that as one large
equivalence class.


In between those two extremes are the split LR algorithm states. These
algorithms note that not every state distinction made in the canonical
LR algorithm has an effect on the shift-reduce or reduce-reduce
conflicts. Thus, you don't need to distinguish those states, you can
use the LR(0) states in those cases. You only need the canonical LR
states in the cases where the look-ahead implied by the left-context
causes the reduction actions to vary. And, like most of these
algorithms you can find the fixed-point of the set working from the
larger set (the canonical LR states) and taking unions or the smaller
set (the LR(0) states) and taking intersections (splitting states).
The fixed points you find vary slightly and I personally prefer David
Spector's method, because you always get a smaller set by starting
with the LR(0) states and splitting. Although I may be wrong, Pager's
algorithm reminds me of the other fixed point finding method, of
starting with the safe canonical LR(1) tables and merging when it is
known to be safe.


However, the key point is that if you take the state splitting far
enough, you can form a complete tree of LR parsing tables, from the
LR(0) tables to the canonical LR(1) tables--all the LR table
algorithms that I know of create a member of that family of tables.
Moreover, when you apply the precendence rules, they apply to that
family of tables in the same way. That is, if you use precedence to
resolve a shift-reduce conflict in some LR(0) state, there is a
precise set of states in each of the family members you need to check
to see if it applies, and it will have an equivalent effect on each of
the sets of LR tables in that family.


The only exception being that if the left-context of the rules in
conflict in the LR(0) tables can resolve the conflict without use of
precedence, there will be no conflict in the canonical LR(1) tables
(or any of the sets of split tables that have applied a split to
resolve that conflict). However, this effect is very rare and of
negligable impact. Most grammars using precendence do not have
shift-reduce conflicts that can be resolved by state splitting--in
fact (if I recall correctly), state splitting generally resolves
reduce-reduce rather than shift-reduce conflicts.


> I made a method where precedences can be implemented by constraining the
> expansions in the rules. Then I know that such a grammar with restraints
> can be rewritten into a CFG. So by that I know, it is a properly of the
> grammar alone and not the parsing algorithm.


Reference to web page or paper please. I may have read it already,
but in case I haven't I want to.


> I think that the Bison LALR(1) uses optimizations beyond what is
> mentioned in the Aho, Sethi & Ullman book. Also, one point is that it
> does not need to compute any LR(1) states, but only SLR(0) or something,
> so that the computations are linear, not exponential.


This is true of all LALR(1) algorithms that I know of, they all use
just the LR(0) aka SLR(0) states and none of them compute the
canonical LR(1) states at any point. Only the state splitting
algorithms use the canonical LR(1) states.


That being said, I have never read the source to any version of Bison
or Yacc (other than Yacc++), so I cannot comment further on what they
do. There may be optimizations in Bison's algorithm that make certain
aspects more complex.


Hope this helps,
-Chris


******************************************************************************
Chris Clark Internet: christopher.f.clark@compiler-resources.com
Compiler Resources, Inc. or: compres@world.std.com
23 Bailey Rd Web Site: http://world.std.com/~compres
Berlin, MA 01503 voice: (508) 435-5016
USA fax: (978) 838-0263 (24 hours)


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.