Re: Error reporting in LR parsers

heirich@cs.ucsd.edu (Alan Heirich)
8 Aug 89 03:04:34 GMT

From comp.compilers

Related articles
Error reporting in LR parsers worley@compass.com (1989-08-02)
Re: Error reporting in LR parsers djones@megatest.uucp (1989-08-04)
Re: Error reporting in LR parsers lai@mips.com (1989-08-07)
*Re: Error reporting in LR parsers heirich@cs.ucsd.edu* (1989-08-08)**
Re: Error reporting in LR parsers heirich@cs.ucsd.edu (1989-08-08)
Re: Error reporting in LR parsers rusty@garnet.Berkeley.EDU (1989-08-10)
Re: Error reporting in LR parsers djones@megatest.com (1989-08-10)
Re: Error reporting in LR parsers djones@megatest.com (1989-08-10)
Re: Error reporting in LR parsers djones@megatest.com (1989-08-10)
Re: Error reporting in LR parsers markg@well.sf.ca.us (1989-08-15)
[1 later articles]

| List of all articles for this month |

From:	heirich@cs.ucsd.edu (Alan Heirich)
Newsgroups:	comp.compilers
Date:	8 Aug 89 03:04:34 GMT
References:	<1989Aug4.032053.753@esegue.uucp> <1989Aug6.024931.10014@esegue.uucp>
Organization:	EE/CS Dept. U.C. San Diego

This is the first of two postings. This posting describes a workaround for
the default reduction problem with yacc. The second posting describes code
modifications to DECUS yacc which provides meaningful diagnostic messages
both for describing a parse (debugging a grammar) and for an end-user at
run time.

In article <1989Aug6.024931.10014@esegue.uucp> djones@megatest.uucp (Dave Jones) writes:
>Mr. Worley has generously taken the time to post a long article about getting
>yacc to give up a list of token-categories which would be legal when a syntax
>error is found. The idea of grouping tokens into categories is a good one.
>But the technique in no way solves the "default reduction" problem, as
>he claims it does.

Mr. Jones is correct. However, the default reduction problem is not hard to
work around.

The problem: yacc violates correct shift-reduce parsing rules by assigning
"default reductions". For the following item:

      A --> B C .

the corresponding rule should be reduced only when the lookahead token is in
the follow set of A. A given automaton state might contain many such items,
and also many items on which a shift is anticipated, e.g.

      Q --> R S . T

(in which case a shift would be expected on when the lookahead token is in
the first set of T).

The yacc parser should report an error whenever the lookahead token is neither
in a follow set of a rule like "A" above, nor in a first set for a rule like
"Q" above. But yacc builds its automaton in such a way that this is not the
case. Instead, a rule like "A" would get reduced whenever the lookahead token
can't be handled in any other way. In other words, a reduction happens when
an error should be reported. These "default reductions" continue until the
parser is in a state with no reductions, at which point it reports an error.
Any diagnostic message based on the state where the error is reported are
thus potentially incorrect, since the lookahead set might be different from
the lookahead set of the state where the error was actually encountered.

Consequences of the problem: there are three types of states: shift only;
reduce only; and combined shift & reduce states. It turns out that this
"problem" in irrelevant to two of the kinds of states; and is easily worked
around for the case of combined shift & reduce states.

case 1: a state contains only shift items.
      Then there is no problem, because the error is reported instantly.

case 2: a state contains only reduce items.
      The default reductions will change the state before the error gets reported.
But this state will have the same lookahead set as the state where the error
was encountered. This is because the reduction will return the parser to a
state where the lookahead set contains the first set of the reduced
nonterminal. The reductions don't change the lookahead set.

case 3: combined shift & reduce items.
      This is the potentially problematic state, which Mr. Jones illustrated in
his posting. As he explained, the default reductions move the parser to a
state in which the lookahead set is not the same as when the error was
encountered. Specifically, the new lookahead set will contain the union of
the first sets of the reduced nonterminal(s), but will not contain the tokens
which could have been shifted in the state where the error was encountered.

      Case 3 has an easy fix. Modify the yacc parser to save the current state
*after* it obtains a new token from yylex. (This occurs near the beginning of
the parse code, right after the debug code that prints out the state number).
Then, when an error is encountered, yyerror traps the current state and
compares it to this saved state. The set of expected tokens is the union of
the lookahead sets of these two states. (In cases 1 and 2 above, these will
be the same sets of tokens).

The next posting describes a set of mods I have made to DECUS yacc to
implement automatic diagnostics based on this approach.

-------------------------
Alan Heirich Comp. Sci. & Eng., Cognitive Science
C-014 University of California, San Diego 92093

heirich@cs.ucsd.edu
aheirich@ucsd.bitnet

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Error reporting in LR parsers

heirich@cs.ucsd.edu (Alan Heirich)8 Aug 89 03:04:34 GMT

heirich@cs.ucsd.edu (Alan Heirich)
8 Aug 89 03:04:34 GMT