Re: HELP needed with this s/r error

clark@quarry.zk3.dec.com (Chris Clark USG)
27 Feb 1997 00:55:35 -0500

          From comp.compilers

Related articles
HELP needed with this s/r error chseet@iti.gov.sg (Seet Chern Hway) (1997-02-16)
Re: HELP needed with this s/r error clark@quarry.zk3.dec.com (1997-02-27)
| List of all articles for this month |

From: clark@quarry.zk3.dec.com (Chris Clark USG)
Newsgroups: comp.compilers
Date: 27 Feb 1997 00:55:35 -0500
Organization: Digital Equipment Corporation - Marlboro, MA
References: 97-02-098
Keywords: yacc, parse, syntax

Seet Chern Hway asked:
> I hope somebody can help me overcome an s/r error in the grammar I have
> constructed to parse a language. The language has the following
> fragment:
>
> INDEX name [segment]
> [ , INDEX name [segment] ]...
>
> where segment refers to the fragment
> SEGMENT item [ , item ]...
...
To which our moderator correctly added:
>> [Looks to me like this is LR(2) -- when it sees a comma it has to reduce one
>> rule or the other but it can't because it needs to look ahead and see if
>> there is an INDEX following. -John]


And while I would normally just recommend an ELR parser generator
(such as Yacc++(r)), so that you could just enter the grammar using
regular expression notation. Now, if you want segment (or something
like it) to have its own rule, you need a generator which is also
LR(2) or better, or a generator which does not "expand" the iteration
into recursion.


However, there is also a variation on left-recursion removal [or is it
left-factoring, I can never remember the distinction between those
terms] which can be applied to make an LR(1) grammar out of this one.
I'll demonstrate using your grammar. Start with your regexp form.


list_index: INDEX name [segment] [, INDEX name [segment]]...
segment: SEGMENT item [, item]...


As the moderator stated, the problem lies in the , token which can
either be followed by an INDEX or item. Therefore, we must "move" the
comma down into the lower productions. Our goal is to have one set of
rules which ends in a trailing comma, and another set which does not.
The rules which end in the trailing comma will be the iterated parts
and the rules which terminate without a comma will be the final ends.


We start by rearranging where the iteration occurs. This is the
disadvantage of trying to make an LR(k) grammar LR(1) or an LR grammar
LL. You can't just put your iteration (or recursion) in where it seems
natural, you must bend to the parsing strategy you have at hand.


list_index: [INDEX name [segment] ,]... INDEX name [segment]
segment: SEGMENT [item ,]... item


If you look at the first part of list_index, you will set that after
expansion it ends with: SEGMENT [item ,]... item ,
However, we can rewrite this as: SEGMENT (item ,)...
This produces the following transformation:


list_index: [INDEX name (segment-trailing-comma | ,)]... INDEX name [segment]
segment: SEGMENT [item ,]... item
segment-trailing-comma: SEGMENT (item ,)...


Now, we'd like to merge the two lists which look like our comma
separated items. To do this, first, we need to pull the SEGMENT part
off the front.


list_index: [INDEX name (SEGMENT list-item-comma | ,)]...
                            INDEX name [SEGMENT list-item]
list-item: [item ,]... item
list-item-comma: (item ,)...


Now, we can make list-item use list-item-comma.


list_index: [INDEX name (SEGMENT list-item-comma | ,)]...
                            INDEX name [SEGMENT list-item]
list-item: [list-item-comma] item
list-item-comma: (item ,)...


Finally, we have something that can be written as yacc rules which
have no conflicts.


fragment
    : list_index
    ;


list_index /* [INDEX name (SEGMENT list-item-comma | ,)]...
                            INDEX name [SEGMENT list-item] */
    : final_index
    | loop_index final_index
    ;


final_index /* INDEX name [SEGMENT list-item] */
    : INDEX name
    | INDEX name SEGMENT list_item
    ;


loop_index /* [INDEX name (SEGMENT list-item-comma | ,)]... */
    : INDEX name SEGMENT list_item_comma
    | INDEX name ','
    ;


list_item /* [list-item-comma] item */
    : item
    | list_item_comma item
    ;


list_item_comma /* (item ,)... */
    : item ','
    | list_item_comma item ','
    ;


-Chris


For those of you who find the plain LR(1) solution still a little
messy and would prefer an ELR parser generator, I throw in the plug
for the company for whom I work:


Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. CompuServe : 74252,1375
3 Proctor Street voice : (508) 435-5016
Hopkinton, MA 01748 USA fax : (508) 435-4847 (24 hours)
[Yuck. Lacking an ELR generatior, I'd use a lexical hack that peeks ahead
in the lexer when it sees a comma and returns a different token for a comma
followed by an INDEX keyword. -John]


--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.