Re: Parsing an If...End If construct

Chris F Clark <cfc@shell01.TheWorld.com>
20 Nov 2004 21:23:55 -0500

          From comp.compilers

Related articles
Parsing an If...End If construct mness1978@hotmail.com (2004-11-17)
Re: Parsing an If...End If construct snicol@apk.net (Scott Nicol) (2004-11-19)
Re: Parsing an If...End If construct vbdis@aol.com (2004-11-19)
Re: Parsing an If...End If construct cfc@shell01.TheWorld.com (Chris F Clark) (2004-11-20)
| List of all articles for this month |

From: Chris F Clark <cfc@shell01.TheWorld.com>
Newsgroups: comp.compilers
Date: 20 Nov 2004 21:23:55 -0500
Organization: The World Public Access UNIX, Brookline, MA
References: 04-11-047 04-11-067
Keywords: syntax, design
Posted-Date: 20 Nov 2004 21:23:55 EST

Michael wrote:
> The bison grammar I have so far is listed below. It is has several
> conflicts that I'm having trouble resolving. Can anyone give me some
> pointers on how to deconflict this grammar?
>[...]
> else_if_list : else_if_list else_if
> | else_if
> ;
>
> else_if : ELSE IF conditions statements
> | /* empty */
> ;


I just want to amplify one of the points made by Scott Nicol as it
points out a common error many grammar writers make when writing
recursive/optional productions. Scott wrote:


> Second problem is that you are burying the optional else_if_list too
> deep.


The two productions I clipped from your grammar are typical of the
problem, and it is a commen grammar writing mistake. The mistake is
making lists of "optional" things. Your else_if non-terminal
describes something optional, which means it has an empty clause
(indicating that the item is not present). Your else_if_list makes a
list of these things. However, that introduces the problem. How does
the parser know how many "empty" versions of the else_if to insert in
the list? Theoretically, it can insert an unbounded number of them.
that makes the grammar inherently ambiguous.


The solution to this kind of problem, is to raise the "empty" up in
the rule nesting. That is, don't make empty an alternative for
else_if, make the rules (in your case you have only 1) that use
else_if take care of its absence.


else_if : ELSE IF conditions statements; // else_if is not optional by itself


else_if_list : else_if_list else_if
| /* empty */ // an else_if_list can be empty
   ;


Now, if this isn't sufficient to resolve the problem, try moving the
empty up yet another level.


else_if_list : else_if_list else_if
| else_if // an else_if_list is not optional by itself
   ;


instruction: END
| GOSUB destination
| GOTO destination
| IF conditions THEN statements opt_else
| IF conditions THEN statements else_if_list opt_else ENDIF // non-empty else_fi_list case
| IF conditions THEN statements opt_else ENDIF // empty else_if_list case
| PRINT print_list
| RETURN
;


This now brings us to the next point. Two things which can go to
empty in a row may also cause conflicts. In your grammar, you have an
option else_if_list followed by an otional opt_else. That can be the
source of conflicts also, as sometimes the parser has a hard time
disambiuating whether one or both of the optional things are missing.
I don't think that's a problem with this grammar, but I haven't run
the grammar through a parser generator to be sure. However, the point
is worth covering, since one resolves it with the same empty-raising
technique.


Change:


opt_else: ELSE statements
        | /* empty */
        ;




else: ELSE statements;


instruction: END
| GOSUB destination
| GOTO destination
| IF conditions THEN statements else // else present
| IF conditions THEN statements // else missing
| IF conditions THEN statements else_if_list else ENDIF // non-empty else_fi_list case, else present
| IF conditions THEN statements else_if_list ENDIF // non-empty else_fi_list case, else missing
| IF conditions THEN statements else ENDIF // empty else_if_list case, else present
| IF conditions THEN statements ENDIF // empty else_if_list case, else missing
| PRINT print_list
| RETURN
;


However, now having expanded this grammar to this point, I see a
potential problem. Both the else and else_if_list rules begin with
ELSE as their first token and the else_if_list rule would like to loop
(recurse) on seeing an ELSE, and that will be a shift-reduce conflict
that really indicates the need for two tokens of lookahead. That kind
of problem requires a slightly different technique to correct. One
needs to "fold" the trailing else clase into the else_if_list. BTW,
to do that one must make the list right recursive rather than left,
because the special case occurs at the left end of the list and not
the right.


else_if_list : else_if else_if_list
| else_if // an else_if_list is not optional by itself
                | else // an else if list an also end with an else clause
   ;


instruction: END
| GOSUB destination
| GOTO destination
| IF conditions THEN statements else // else present
| IF conditions THEN statements // else missing
| IF conditions THEN statements else_if_list ENDIF // non-empty else_fi_list case
| IF conditions THEN statements ENDIF // empty else_if_list case
| PRINT print_list
| RETURN
;


Soapbox warning: the following comments are simply my personal opinion
on the matter, and some may find them to be "marketing propaganda".


I find the use of recursion and empty productions to be totally opaque
for this prupose. It is very hard to write grammars that use
recursion and empty productions and not to introduce conflicts. It is
simply too easy to introduce these naive mistakes.


There is a better solution, ELR (aka regular-right-part) grammars.
These are grammars where one uses the regular expression operators
*,+,? to write optional items and lists. If one uses these operators
in place of recurive rules with empty alternatives, then one naturally
writes grammars that have fewer conflict problems. Here is a Yacc++
fragment for the problem rules.


instruction: END
| GOSUB destination
| GOTO destination
| IF conditions THEN statements else?
| IF conditions THEN statements else_if* else? ENDIF
| PRINT print_list
| RETURN
;


And here is what I would have naively written (simply because I
wouldn't have started with an "else" or "else_if" non-terminal until I
saw they made the semantics easier):


| IF conditions THEN statements (ELSE statements)?
| IF conditions THEN statements (ELSE IF conditions statements)*
        (ELSE statements)? ENDIF


Hope this helps,
-Chris


*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.