Re: Full LR(1) parser generator Hyacc 0.9 release

Chris F Clark <cfc@TheWorld.com>
Wed, 27 Feb 2008 00:04:06 -0500 (EST)

From comp.compilers

Related articles
[7 earlier articles]
Re: Full LR(1) parser generator Hyacc 0.9 release paul@paulbmann.com (Paul B Mann) (2008-02-14)
Re: Full LR(1) parser generator Hyacc 0.9 release cfc@shell01.TheWorld.com (Chris F Clark) (2008-02-14)
Re: Full LR(1) parser generator Hyacc 0.9 release jdenny@ces.clemson.edu (Joel E. Denny) (2008-02-23)
Re: Full LR(1) parser generator Hyacc 0.9 release cfc@TheWorld.com (Chris F Clark) (2008-02-24)
Re: Full LR(1) parser generator Hyacc 0.9 release jdenny@ces.clemson.edu (Joel E. Denny) (2008-02-25)
Re: Full LR(1) parser generator Hyacc 0.9 release paul@paulbmann.com (Paul B Mann) (2008-02-26)
*Re: Full LR(1) parser generator Hyacc 0.9 release cfc@TheWorld.com (Chris F Clark)* (2008-02-27)**
Re: Full LR(1) parser generator Hyacc 0.9 release jdenny@ces.clemson.edu (Joel E. Denny) (2008-02-27)
Re: Full LR(1) parser generator Hyacc 0.9 release jdenny@ces.clemson.edu (Joel E. Denny) (2008-02-27)
Re: Full LR(1) parser generator Hyacc 0.9 release cfc@TheWorld.com (Chris F Clark) (2008-02-28)
Re: Full LR(1) parser generator Hyacc 0.9 release cfc@shell01.TheWorld.com (Chris F Clark) (2008-02-28)
Re: Full LR(1) parser generator Hyacc 0.9 release jdenny@ces.clemson.edu (Joel E. Denny) (2008-02-28)
Re: Full LR(1) parser generator Hyacc 0.9 release paul@paulbmann.com (Paul B Mann) (2008-02-28)

| List of all articles for this month |

From:	Chris F Clark <cfc@TheWorld.com>
Newsgroups:	comp.compilers
Date:	Wed, 27 Feb 2008 00:04:06 -0500 (EST)
Organization:	Compilers Central
References:	08-02-019 08-02-022 08-02-030 08-02-037 08-02-040 08-02-041 08-02-044 <Pine.SOC.4.61.0802230000570.9914@unixlab03.ces.clemson.edu> 08-02-076 <Pine.SOC.4.61.0802251216540.13590@unixlab03.ces.clemson.edu>
Keywords:	LR(1)
Posted-Date:	27 Feb 2008 00:07:22 EST

Joel E. Denny wrote > and I wrote: (> >).
> LALR(1), Pager's algorithm, IELR(1), and canonical LR(1) are all parser
> table generation algorithms. When the grammar is non-LR(1) coupled with a
> specification for resolving conflicts, I have found examples where the
> parser tables generated by Pager's algorithm do not accept the same
> language as do the parser tables generated by canonical LR(1). I have not
> found such examples for IELR(1) versus canonical LR(1).
>
> > The example where Pager's algorithm parses differently when conflict
> > resolution is added would be interesting. I don't see how it should
> > happen, since s/r conflicts should be preserved over all splitting and
> > merges of the LR states (as you showed above). <*see note at end>
>
> Every S/R conflict is preserved for at least some but not necessarily all
> of its viable prefixes. Before splitting, the conflict might resolve as a
> reduce. After splitting, the action is a shift for any viable prefix for
> which the conflict is not preserved. That is, for such viable prefixes,
> the action might change. This is the scenario I most commonly encounter.
...
> on this topic. Some of my grammar examples can be found there. If there
> is interest, I can replicate a simple example here.

I presume you mean this case copied from email you sent to the Bison
maintainers email list (I'm sorry I hadn't read the Bison related
email archives, so I had not seen the example):

> Yes, but it might affect viable prefixes it did not in the LR(1)
> parser.
>
> I should have included an example before:
>
> S: 'a' A 'a'
> | 'b' A 'b'
> ;
> A: 'a' 'a'
> | 'a'
> ;
>
> For canonical LR(1), there is a S/R conflict at the dot in this valid
> input:
>
> aa.a
>
> but not at the dot in this valid input:
>
> ba.ab
>
> For LALR(1), the associated states are merged so that both inputs have
> the
> conflict.
>
> If you then add this:
>
> %left 'a'
>
> both the LR(1) and LALR(1) parsers perform a reduce at the dot in the
> first input. The LR(1) parser performs a shift at the dot in the
> second
> input, but the LALR(1) parser still performs a reduce. Thus, the
> LR(1)
> parser accepts the second input, but the LALR(1) parser eventually
> reports
> a syntax error.

This is a typical problem with how precedences as defined in yacc
apply globally and can have unintended side effects. The same effect
(a precedence applying where it shouldn't) can be gotten with grammars
where the result is the same with both an LALR or an LR parser
generators. All that is needed for the precedence to misapply is that
one has two conflicts where the shift is on the same token, but the
rules require different shift/reduce results. Note sometimes you can
fix this by adding a %prec directive on to the end of one (or both of
the rules).

As Xin Chen points out in the discussion, the grammar is actually an
LR(2) grammar, and that either the %left or %right declaration causes
one of two sentences of the language to be lost. Your correction to
flag the unreachable rule that comes from [mis-]applying the %left
precedence with LALR table construction seems correct. If you want
the affect of %left in that case, you are better off, simply removing
the rule (A: a a;) that causes the conflict.

It is worth noting that if you use a %right precedence, the LALR and
LR tables will accept the same language.

You can see all of this if you follow Hans Aberg's algorithm and split
the rule A into A1 and A2 for the two different contexts, as in the
next grammar which will produce the LR results even with an LALR
parser generator.

    S: 'a' A1 'a'
      | 'b' A2 'b'
      ;
  A1: 'a' 'a' // %right
      | 'a' // %left
      ;
  A2: 'a' 'a'
      | 'a'
      ;

This also helps motivate why Hans explains that you want to be able to
do precedences in a grammar position dependent way, i.e. you might
want the "a" in context A1 to be a reduce and the "a" in context A2 to
be a shift, so that you cannot only disambiguate these cases, but even
more tricky ones. It is my belief that you can follow Hans
prescription and come up with a "local" precedence that applies just
where you want it.

Now, the one thing I don't understand (and it is relevant to the
discussion) is whether the either the Pager or the IELR algorithm
computes something different than either the LALR result or the LR
result? I think one can argue that both of those results are "right"
for this grammar. However, I don't think there are any other results
that LR-family parser generators can compute.

Note, that the situation can get more complicated. In fact, you can
find cases where you actually want the LR result for some parts of the
grammar and the LALR result for other parts. As I said, I think you
can define a form of position based precedence that will resolve that
right. I think you can actually define a set of precedence rules that
are equivalent to a satisfiability problem. However, at the same
time, if you are defining a language (rather than just defining a
grammar for an existing language), you probably need to ask yourself
whether you really want that complexity.

Finally, I want to note that I misunderstood the nature of the proof
offered before. The proof shows that for any grammar with s/r
conflicts in the SLR(0) machine there will be states in the LR(1)
machine that have the same conflict, but does not imply (as I
misunderstood it too say) that all LR(1) states with the same
reduction will have that conflict. Thus, from the proof one can
assert that if the SLR(0) machine has a state needing s/r conflict
resolution, then the LR(1) machine will have at least 1 state needing
the same s/r conflict resolution. However, there may also be cases
where the LR(1) machine does not need s/r conflict resolution because
the splitting of the states has disambiguated the left context and
there is no conflicting shift action for some viable prefix. I
believe in that case, the LALR machine can be coaxed into declaring a
string as a syntax error (that will not be treated as an error by the
LR machine) by a %left associativity declaration (or other precedence
where a shift is prefered over a reduce) or a %nonassoc declaration.
If the machine is given precedence directives that prefer shifting in
the problem state, the LALR and LR machines will behave the same.

Hope this helps,
-Chris

******************************************************************************
Chris Clark Internet: christopher.f.clark@compiler-resources.com
Compiler Resources, Inc. or: compres@world.std.com
23 Bailey Rd Web Site: http://world.std.com/~compres
Berlin, MA 01503 voice: (508) 435-5016
USA fax: (978) 838-0263 (24 hours)

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Full LR(1) parser generator Hyacc 0.9 release

Chris F Clark <cfc@TheWorld.com>Wed, 27 Feb 2008 00:04:06 -0500 (EST)

Chris F Clark <cfc@TheWorld.com>
Wed, 27 Feb 2008 00:04:06 -0500 (EST)