Re: How to get this to an lalr(1) grammar?

jholland@world.std.com (Jerome T Holland)
Mon, 22 Aug 1994 20:18:52 GMT

          From comp.compilers

Related articles
How to get this to an lalr(1) grammar? Mark_Tarrabain@mindlink.bc.ca (1994-08-22)
Re: How to get this to an lalr(1) grammar? jholland@world.std.com (1994-08-22)
Re: How to get this to an lalr(1) grammar? pjj@cs.man.ac.uk (1994-08-28)
How to get this to an lalr(1) grammar? mph@zdenka.demon.co.uk (1994-08-28)
Re: How to get this to an lalr(1) grammar? mickunas@mickunas.cs.uiuc.edu (1994-09-10)
Re: How to get this to an lalr(1) grammar? dekker@dutiag.twi.tudelft.nl (1994-09-14)
Re: How to get this to an lalr(1) grammar? mark@omnifest.uwm.edu (Mark Hopkins) (1994-10-09)
| List of all articles for this month |
Newsgroups: comp.compilers
From: jholland@world.std.com (Jerome T Holland)
Keywords: parse, LALR
Organization: Parsifal Software
References: 94-08-137
Date: Mon, 22 Aug 1994 20:18:52 GMT

Mark Tarrabain <Mark_Tarrabain@mindlink.bc.ca> wrote:
> Hi. I've been trying to figure out how to make a certain grammar
>lalr(1). Or at least create an equivalent lalr(1) grammar that parses the
>same language. I'm not having much luck with it, so I thought I'd see if
>anybody else could figure it out. Here's the grammar as it presently
>exists (21 rules):
>
> s -> c
> s -> s t c
>
> c -> V n p
>
> a -> A
> a -> C o
>
> o -> /* empty */
> o -> A
>
> t -> T
> t -> D
> t -> a u
>
> u -> /* empty */
> u -> T
>
> n -> /* empty */
> n -> L b
> n -> q
>
> q -> N
> q -> q a N
>
> b -> /* empty */
> b -> B q
>
> p -> /* empty */
> p -> P N
>
>
> Capital letters denote terminals, lower case letters nonterminals. The
>language and grammar are very evidently LR(2). If my understanding is
>correct, it should be possible to simplify it to LR(1), perhaps even
>LALR(1). I'd appreciate whatever help anybody could give.
>


Here's a solution, although not a very elegant one. A quick pass through
a parser generator reveals that the LR2ness comes from the fact that
"a" is a component of both "c" and "t". To make the grammar LR1 you have
rewrite the grammar so that the "a" that occurs in "t" can follow "q"
immediately without first forcing a reduction. Here's one way to do the
job.


First, rewrite s so it is right recursive rather than left recursive.


      s -> c
      s -> c t s


Now expand trailing productions for c:


      c -> V n p


becomes


      c -> V n
      c -> V n P N


which in turn, expanding n, becomes


      c -> V
      c -> V L b
      c -> V q
      c -> V P N
      c -> V L b P N
      c -> V q P N


Now expand b:


      c -> V
      c -> V L
      c -> V L B q
      c -> V q
      c -> V P N
      c -> V L P N
      c -> V L B q P N
      c -> V q P N


Now, define d:


      d -> c t


Rewrite s once again:


      s -> c
      s -> d s


Now, rewrite d, expanding c


      d -> V t
      d -> V L t
      d -> V L B q t
      d -> V q t
      d -> V P N t
      d -> V L P N t
      d -> V L B q P N t
      d -> V q P N t


now define r:
      r -> q t


and rewrite d, replacing q t with r:


      d -> V t
      d -> V L t
      d -> V L B r
      d -> V r
      d -> V P N t
      d -> V L P N t
      d -> V L B q P N t
      d -> V q P N t


Now, expand t in the definition of r:


      r -> q T
      r -> q D
      r -> q a u


This final step has brought the a in t to a position where it follows
q directly.


Now, we can gather all the pieces together to write the equivalent
grammar:


      s -> c
      s -> d s


      c -> V
      c -> V L
      c -> V L B q
      c -> V q
      c -> V P N
      c -> V L P N
      c -> V L B q P N
      c -> V q P N


      d -> V t
      d -> V L t
      d -> V L B r
      d -> V r
      d -> V P N t
      d -> V L P N t
      d -> V L B q P N t
      d -> V q P N t


      r -> q T
      r -> q D
      r -> q a u


      a -> A
      a -> C o


      o -> /* empty */
      o -> A


      t -> T
      t -> D
      t -> a u


      u -> /* empty */
      u -> T


      q -> N
      q -> q a N


This grammar, although ungainly, is LALR(1). Some common terms could be
abstracted out of the definitions for "c" and "d" to make it appear simpler.
The "right" way to do this would probably depend on the semantic actions to
be performed.


Jerry Holland
--
Parsifal Software | Parsifal -> AnaGram | 800-879-2577
P.O. Box 219 | AnaGram -> parsers | Voice/Fax 508-358-2564
Wayland, MA 01778 | parsers -> results, fast | jholland@world.std.com
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.