Re: Error reporting in LR parsers

djones@megatest.com (Dave Jones)
10 Aug 89 21:53:01 GMT

From comp.compilers

Related articles
[2 earlier articles]
Re: Error reporting in LR parsers lai@mips.com (1989-08-07)
Re: Error reporting in LR parsers heirich@cs.ucsd.edu (1989-08-08)
Re: Error reporting in LR parsers heirich@cs.ucsd.edu (1989-08-08)
Re: Error reporting in LR parsers rusty@garnet.Berkeley.EDU (1989-08-10)
Re: Error reporting in LR parsers djones@megatest.com (1989-08-10)
Re: Error reporting in LR parsers djones@megatest.com (1989-08-10)
*Re: Error reporting in LR parsers djones@megatest.com* (1989-08-10)**
Re: Error reporting in LR parsers markg@well.sf.ca.us (1989-08-15)
Re: Error reporting in LR parsers eachus@mbunix.mitre.org (1989-08-14)

| List of all articles for this month |

From:	djones@megatest.com (Dave Jones)
Newsgroups:	comp.compilers
Date:	10 Aug 89 21:53:01 GMT
References:	<1989Aug8.130922.1019@esegue.uucp)
Organization:	Megatest Corporation, San Jose, Ca

The errors in the following article all seem to be due to the mistaken
impression that tokens which can legally follow the production of
a non-terminal symbol are independant of the context of that non-terminal.

>From article <1989Aug8.130922.1019@esegue.uucp), by heirich@cs.ucsd.edu (Alan Heirich):
...
)
) The problem: yacc violates correct shift-reduce parsing rules by assigning
) "default reductions". For the following item:
)
) A --) B C .
)
) the corresponding rule should be reduced only when the lookahead token is in
) the follow set of A.

I've got a little trouble with the terminology. I'm going to assume

        "Follow set of A" means all the terminals which can follow A in any
        sentential form. (This is the definition is most applicable to the
        theory of SLR parsers, but I can't think of any alternative meaning.)

        "... should be reduced" is a little trickier. Except for an obscure
        bug related to the synthesized error token, yacc does what it "should do".
        I am assuming that you are talking about what a full LR(1) parser should
        do. But in that case you should say that it should reduce the rule
        A <= BC if and only if the LR-stack and lookahead togther form a valid
        prefix with this form: x B C t, and x A t is a valid prefix.
        What you actually describe is what an SLR parser "should do".

) A given automaton state might contain many such items,
) and also many items on which a shift is anticipated, e.g.
)
) Q --) R S . T
)
) (in which case a shift would be expected on when the lookahead token is in
) the first set of T).
)
) The yacc parser should report an error whenever the lookahead token is neither
) in a follow set of a rule like "A" above, nor in a first set for a rule like
) "Q" above.

The above, too, is a little vague. But consider this:

G : '[' A ']'
    | A ';'
    ;

A : BC
;

[Before puzzling over this too much, read on. A similar grammar comes into
  play later.]

) ...
)
) Consequences of the problem: there are three types of states: shift only;
) reduce only; and combined shift & reduce states. It turns out that this
) "problem" in irrelevant to two of the kinds of states; and is easily worked
) around for the case of combined shift & reduce states.
...
)
) case 2: a state contains only reduce items.
) The default reductions will change the state before the error gets reported.
) But this state will have the same lookahead set as the state where the error
) was encountered.

Counter-example:

        G : '(' A ')'
            | A ';'
            ;

        A : 'a'
            ;

Run yacc on this and you'll see this state:

        State 4:

A : a _ (3)

. reduce 3

In state 4, the only one which produces A, the only action is a default
reduction. The state has no follow-set coded for it by yacc, but in the
sense of the above comments, its follow set contains ')' and ';' because
it is used in both of these rules:

        G : '(' A ')'
            | A ';'
            ;

Using either sense of the term "follow set", we can, by means of the default
reduction, uncover a state with a different set of legal tokens.

Consider the input

          '(' A x

After doing the default reduction in state 4, we reach state 5:

          state 5
                G : ( A_)

                ) shift 7
. error

Its only legal next token is ')'.

)
) case 3: combined shift & reduce items.
) This is the potentially problematic state, which Mr. Jones illustrated in
) his posting. As he explained, the default reductions move the parser to a
) state in which the lookahead set is not the same as when the error was
) encountered. Specifically, the new lookahead set will contain the union of
) the first sets of the reduced nonterminal(s), but will not contain the tokens
) which could have been shifted in the state where the error was encountered.

Ahem.

G: '(' Q ')'
  ;

Q: P
  | P ';'
  ;

P : 'p'
    | 'p' '!'
    ;

Need I say more?

)
) Case 3 has an easy fix. Modify the yacc parser to save the current state
) *after* it obtains a new token from yylex. (This occurs near the beginning of
) the parse code, right after the debug code that prints out the state number).
) Then, when an error is encountered, yyerror traps the current state and
) compares it to this saved state. The set of expected tokens is the union of
) the lookahead sets of these two states.

Almost. Make the parser save the intire set of intermediate states popped
by default reductions, not just the first one, and use the union of all their
shiftable tokens and the shiftable tokens from the error state. This is,
effectively, the only workable algorithm I've seen so far, although I and
others have presented variations on it.
[From djones@megatest.com (Dave Jones)]

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Error reporting in LR parsers

djones@megatest.com (Dave Jones)10 Aug 89 21:53:01 GMT

djones@megatest.com (Dave Jones)
10 Aug 89 21:53:01 GMT