Re: How to get this to an lalr(1) grammar?

mickunas@mickunas.cs.uiuc.edu (Dennis Mickunas)
Sat, 10 Sep 1994 00:34:13 GMT

          From comp.compilers

Related articles
How to get this to an lalr(1) grammar? Mark_Tarrabain@mindlink.bc.ca (1994-08-22)
Re: How to get this to an lalr(1) grammar? jholland@world.std.com (1994-08-22)
Re: How to get this to an lalr(1) grammar? pjj@cs.man.ac.uk (1994-08-28)
How to get this to an lalr(1) grammar? mph@zdenka.demon.co.uk (1994-08-28)
Re: How to get this to an lalr(1) grammar? mickunas@mickunas.cs.uiuc.edu (1994-09-10)
Re: How to get this to an lalr(1) grammar? dekker@dutiag.twi.tudelft.nl (1994-09-14)
Re: How to get this to an lalr(1) grammar? mark@omnifest.uwm.edu (Mark Hopkins) (1994-10-09)
| List of all articles for this month |
Newsgroups: comp.compilers
From: mickunas@mickunas.cs.uiuc.edu (Dennis Mickunas)
Keywords: parse, LALR
Organization: University of Illinois at Urbana
References: 94-08-137
Date: Sat, 10 Sep 1994 00:34:13 GMT

Mark_Tarrabain@mindlink.bc.ca (Mark Tarrabain) writes:


> Hi. I've been trying to figure out how to make a certain grammar
>lalr(1). Or at least create an equivalent lalr(1) grammar that parses the
>same language. I'm not having much luck with it, so I thought I'd see if
>anybody else could figure it out. Here's the grammar as it presently
>exists (21 rules):


> s -> c
> s -> s t c


> c -> V n p


> a -> A
> a -> C o


> o -> /* empty */
> o -> A


> t -> T
> t -> D
> t -> a u


> u -> /* empty */
> u -> T


> n -> /* empty */
> n -> L b
> n -> q


> q -> N
> q -> q a N


> b -> /* empty */
> b -> B q


> p -> /* empty */
> p -> P N




> Capital letters denote terminals, lower case letters nonterminals. The
>language and grammar are very evidently LR(2). If my understanding is
>correct, it should be possible to simplify it to LR(1), perhaps even
>LALR(1).


It's little wonder that you're having difficulty transforming the
grammar from LR(2) to LR(1), since it is not LR(2) to begin with. To
see this, note that the strings "V N C A V" and "V N C A N" have the
following parses:


^VNCAV shift
V^NCAV shift
VN^CAV reduce q->N
Vq^CAV reduce n->q
Vn^CAV reduce p->/* empty */
Vnp^CAV reduce c->Vnp
c^CAV reduce s->c
s^CAV shift
sC^AV shift
sCA^V reduce o->A
sCo^V reduce a->Co
sa^V reduce u->/* empty */
sau^V reduce t->au
st^V shift
stV^ reduce n->/* empty */
stVn^ reduce p->/* empty */
stVnp^ reduce c->Vnp
stc^ reduce s->stc
s^ accept


and


^VNCAN shift
V^NCAN shift
VN^CAN reduce q->N
Vq^CAN shift
VqC^AN shift
VqCA^N reduce o->A
VqCo^N reduce a->Co
Vqa^N shift
VqaN^ reduce q->qaN
Vq^ reduce n->q
Vn^ reduce p->/* empty */
Vnp^ reduce c->Vnp
c^ reduce s->c
s^ accept


Notice that with the sentential forms
      Vq^CAV (reduce n->q)
and
      Vq^CAN (shift)
it is not possible to resolve the shift-reduce conflict with merely 2
symbol look-ahead. Thus the grammar is at best LR(3).


I was able to discover this by first attempting (by hand) a mechanical
transformation from LR(k) to LR(k-1), using a hybrid algorithm derived
from the paper [5] that Michael Harrison mentioned in a previous
response, and a subsequent paper [6]. (The reason for the hybrid is
somewhat technical, having to do with whether the semantics of parses
involving /*empty*/ rules can be preserved.)


I believe that your intention was to preserve the structure imposed by
the grammar. (Otherwise, it's dead easy to construct a grammar that
works. Note that the language is regular. Simple back-substitution
and recursion-to-Kleene-closure provides a regular expression.) So I
wanted to avoid mangling your grammatical structure. Thus I was
attempting to obtain a "complete cover" so as to permit the semantics
to remain unchanged. Even with the hybrid technique (which uses as
many shortcuts as I can imagine), the resulting grammar contains 143
rules. Berkeley yacc subsequently shows a shift-reduce conflict,
indicating that the new grammar is not LR(1), and that the original
is therefore not LR(2). Analyzing the shift-reduce conflict led to
the non-LR(2) counterexample above. (If anyone is interested, I can
post the 143-rule grammar.)


I do believe from examining the conflicting structures that the
original grammar *is* LR(3). I could verify that by performing the
LR(k)-to-LR(k-1) transformation a second time; but that's something
I'm not prepared to do by hand in this lifetime, since the resulting
grammar would probably have about 1000 rules. It would be possible to
perform a hybrid transformation from LR(k)-to-LR(k-2) but I believe
that again a very large grammar would result. This is not due to
inherent inefficiency in the transformation, but rather is necessary
in order to preserve the structure imposed by the original grammar.


Judging by some responses concerning the equivalence of LR(k) and
LR(1), it is still apparently a well-kept secret that once you have
obtained an LR(k) grammar, G, it is possible to obtain *mechanically*
an LR(1) grammar, G' (or even an LR(0) grammar, provided that the
language is prefix-free). In fact, the resulting grammar "completely
covers" the original LR(k) grammar (meaning that using only table
look-up, a G'-parse may be transformed into the original G-parse).
Those who are interested should first understand various
characterizations of LR(0) languages (see for example, Michael
Harrison's text [1], Section 13.3; also [1] cites other relevant
papers, especially by Harrison & Havel, and Geller & Harrison). For a
treatment of cover grammars, see papers by James Gray & Michael
Harrison [2], or the work by Anton Nijholt [3,4]. The formal LR(k) to
LR(0) algorithms are presented in [5,6]. If you'd like just a brief
tutorial, look at the file /pub/mickunas/pascal.descr available via
anonymous ftp from ftp.cs.uiuc.edu, which describes how to obtain
both an LR(0) version of the conventional expression grammar, as well
as an LR(0) grammar for Pascal.


[1] Harrison, M. A., _Introduction to Formal Language Theory_,
        Addison-Wesley, Reading, MA. (1978).


[2] Gray, J.N., and M.A. Harrison, "On the covering and reduction
        problems for context-free grammars," _Journal of the ACM 19_,
        pp. 675-698 (1972).


[3] Nijholt, A., "On the Covering of Parsable Grammars," _Journal
        of Computer and System Sciences 15_, pp.99-110 (1977).


[4] Nijholt, A., "A Survey of Normal Form Covers for Context Free
        Grammars," Informatica Rapport 49, Vrije Universiteit (1979).


[5] Mickunas, M.D., R.L. Lancaster, and V.B. Schneider, "Transforming
        LR(k) Grammars to LR(1), SLR(1), and (1,1) Bounded Right-Context
        Grammars," _Journal of the ACM 23_, pp.511-533 (1976).


[6] Mickunas, M.D., "On the Complete Covering Problem for LR(k)
        Grammars," _Journal of the ACM 23_, pp.17-30 (1976).


---
M. Dennis Mickunas
Department of Computer Science 1304 W. Springfield Ave.
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.