Re: look for LR(k) grammar

Chris F Clark <cfc@shell01.TheWorld.com>
Sat, 20 Dec 2008 22:19:23 -0500

          From comp.compilers

Related articles
look for LR(k) grammar txchen@gmail.com (Tom) (2008-12-15)
Re: look for LR(k) grammar felipeangriman@gmail.com (Felipe Angriman) (2008-12-15)
Re: look for LR(k) grammar pjj@cs.man.ac.uk (Pete Jinks) (2008-12-17)
Re: look for LR(k) grammar txchen@gmail.com (Tom) (2008-12-18)
Re: look for LR(k) grammar cfc@shell01.TheWorld.com (Chris F Clark) (2008-12-20)
Re: look for LR(k) grammar txchen@gmail.com (Tom) (2009-01-15)
| List of all articles for this month |

From: Chris F Clark <cfc@shell01.TheWorld.com>
Newsgroups: comp.compilers
Date: Sat, 20 Dec 2008 22:19:23 -0500
Organization: The World Public Access UNIX, Brookline, MA
References: 08-12-086 08-12-087 08-12-099
Keywords: LR(1)
Posted-Date: 21 Dec 2008 10:43:30 EST

Tom <txchen@gmail.com> writes:


> Yes I think this is a LR(2) grammar.
>
> It is similar to a pattern of LR(k) grammars that I know:
>
> S : a A D a ;
> S : b A D b ;
> S : a B D b ;
> S : b B D a ;
>
> A : a ;
> B : a ;
> D : c^k ;
>
> Here c^k means the concatenation of k letters 'c'.
> This is a LR(k+1) grammar, depending on the value of k.
>
> I'm looking for LR(k) grammars that are a little more complicated in
> the following way:
>
> There is an approach to solve the reduce/reduce conflict in a LR(1)
> parsing machine by splitting the states involved, for example,
> discussed in D. Spector (81,88) or D. Pager (77, lane-tracing). For
> both grammars above, there are only 2 states involved: the state a
> that contains the conflict, and one of its immediate parent state b
> that leads to the conflict in a. I'm hoping to get LR(k) grammars that
> contains more levels of states involved, not just 2 levels as
> discussed here.


>> On Mon, Dec 15, 2008 at 8:17 AM, Tom <txc...@gmail.com> wrote:
>> > I'm looking for LR(k) grammars where k > 1.
>>
>> S : B a b
>> S : C a c
>> B : 1
>> C : 1
>>
>> I think that will do for k = 2. (I haven't tested it)


Ah, state splitting.


First, as far as I know, state splitting is no help in increasing
k. State splitting is about restoring *left* context that gets lost in
the SLR/LALR algorithms. If you do canonical LR parsing, you never
merge the states, and thus do not need to split them. If you are
doing some kind of minimal state LR processing, (i.e. where one
computes the LR(0) tables first, as one does in both SLR and LALR
parsing), then you have to deal with state splitting.


So, borrowing from above (and simplifying): the following grammar
involves state splitting:


S : a A a ;
S : b A b ;
S : a B b ;
S : b B a ;
A : a a ;
B : a a ;


Let's look at the LR(0) tables, with the LALR lookaheads:


State 0:
        S : .a A a ;
        S : .b A b ;
        S : .a B b ;
        S : .b B a ;
        a -> 1
        b -> 2


State 1:
        S : a .A a ;
        S : a .B b ;
        A : .a a ;
        B : .a a ;
        a -> 3 // state 3 needs to split
        A -> 5
        B -> 6


State 2:
        S : b .A b ;
        S : b .B a ;
        A : .a a ;
        B : .a a ;
        a -> 3 // state 3 needs to split
        A -> 7
        B -> 8


State 3:
        A : a .a ;
        B : a .a ;
        a -> 4


State 4: // reduce-reduce conflict
        A : a a .; a (from 5) | b (from 6)
        B : a a .; a (from 8) | b (from 7)


State 5:
        S : a A .a ;


State 6:
        S : b A .b ;


State 7:
        S : a B .b ;


State 8:
        S : b B .a ;


Now, if you split state 3 & 4into two new states 3x, 3y, 4x, and 4y,
let's see the machine: (I'll just show the changed states):


State 1:
        S : a .A a ;
        S : a .B b ;
        A : .a a ;
        B : .a a ;
        a -> 3x
        A -> 5
        B -> 6


State 2:
        S : b .A b ;
        S : b .B a ;
        A : .a a ;
        B : .a a ;
        a -> 3y
        A -> 7
        B -> 8


State 3x:
        A : a .a ;
        B : a .a ;
        a -> 4x


State 4x: // reduce-reduce conflict
        A : a a .; a (from 5)
        B : a a .; b (from 7)


State 3y:
        A : a .a ;
        B : a .a ;
        a -> 4y


State 4y:
        A : a a .; b (from 6)
        B : a a .; a (from 8)


That is the essence of state splitting. We had different LHS side
contexts for the two uses of state 3, whether we were coming from
state 1 or state 2. This context is lost in the LR(0) machine. Thus,
when we got to state 4 and need to know what to reduce, we couldn't
distunguish the two cases and the lookaheads all get merged together
and we get the conflict.


Now, it sound like you want to understand how to get more rules
involved. (You wrote more states, but I think you mean more rules, as
they give you more states on the tail end you have to search through.)
One way to get more rules involved, is to nest the rules for A anb B
in other rules.


For example, work through this grammar for the same language:


S : a A a ;
S : b A b ;
S : a B b ;
S : b B a ;
A : a C ;
B : a D ;
C : a ;
D : a ;


Here is an LR(2) grammar for the same language:


S : a A a ;
S : b A b ;
S : a B b ;
S : b B a ;
A : C a ;
B : D a ;
C : a ;
D : a ;


Here is an LR(3) grammar:


S : a A a a ;
S : b A a b ;
S : a B a b ;
S : b B a a ;
A : C a ;
B : D a ;
C : a ;
D : a ;


Here is a grammar that isn't LR(k) for any k, for a language that
could have an LR(1) grammar for it--in fact the language is regular.
The language is (a|b)a+(a|b).


S : a A a ;
S : b A b ;
S : a B b ;
S : b B a ;
A : A a ;
B : B a ;
A : a ;
B : a ;


Note, this grammar is particularly interesting to me as it is part of
what we tried to solve in Yacc++, when we did LR(infinite) (or what I
sometimes call LR(closed) parsing. If you look carefully, you will
see that you can parse it deterministically, However, you have to push
an indefinite amount of rules on the "stack". Still, when you get
done, and you see the EOF marker, you can then deterministically
resolve the rules you had pushed. I believe that every unambiguous
CFG has such a deterministic parse, and I believe the computing the
"closure" is the method to find it (if it exists), which is why I
sometimes refer to the method as LR(closed). Unfortunately, due to
the halting probloem, there are grammars that attempting to compute
the closure on will never terminate.


In any case, I hope the supplied grammars at least partially answer
your original question (and perhaps shed some insight into what is
going on).


Hope this helps,
-Chris


******************************************************************************
Chris Clark Internet: christopher.f.clark@compiler-resources.com
Compiler Resources, Inc. or: compres@world.std.com
23 Bailey Rd Web Site: http://world.std.com/~compres
Berlin, MA 01503 voice: (508) 435-5016
USA fax: (978) 838-0263 (24 hours)


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.