Error recovery and LR(1)/LALR(1)

"Chris F Clark" <cfc@world.std.com>
20 Jun 2002 21:48:56 -0400

          From comp.compilers

Related articles
Error recovery and LR(1)/LALR(1) heng@Ag.arizona.edu (Heng Yuan) (2002-06-17)
Re: Error recovery and LR(1)/LALR(1) gvcormac@uwaterloo.ca (Gordon V Cormack) (2002-06-20)
Re: Error recovery and LR(1)/LALR(1) kgw-news@stiscan.com (2002-06-20)
Error recovery and LR(1)/LALR(1) cfc@world.std.com (Chris F Clark) (2002-06-20)
Re: Error recovery and LR(1)/LALR(1) soenke.kannapinn@wincor-nixdorf.com (=?Windows-1252?Q?S=F6nke_Kannapinn?=) (2002-06-28)
| List of all articles for this month |

From: "Chris F Clark" <cfc@world.std.com>
Newsgroups: comp.compilers
Date: 20 Jun 2002 21:48:56 -0400
Organization: Compilers Central
References: 02-06-055
Keywords: parse, LALR
Posted-Date: 20 Jun 2002 21:48:56 EDT

> I am in the process of writing an LR(1)/LALR(1) parser generator
...
> I encountered problems dealing with error states.
> My concern is the loss of chance of early recovery.
>
> Situation 1:
> For example, a DFA state contains the following LR(1) items
> A -> alpha X . , lookahead = 'a'
> B -> beta X . gamma , lookahead = 'b'
> My question is really what to do if the lookahead is neither 'a'
> or 'b'? Should A be reduced?


This is a very insightful question to ask. There are parser
generators that do reduce A in the above situation. However, they
generally offer inferior error recovery.


The precise statement if you reduce A in this situation is that the
resulting parser will not *shift* any tokens before detecting an
error. However, it may execute one or more reductions.


If you are using the parser state to display the tokens that are
expected given a certain error (eg. saw a "c" was expecting an "a" or
a "b"--here I presume "b" is in the first set of gamma), the reduction
may cause you to miss printing out some expected tokens.


Upon encountering this problem, several authors have proposed keeping
track of a separate state indicator that does not track reductions,
only shifts. However, the run-time overhead of that is non-trivial.
(Of course, you might decide that the run-time overhead is not a
problem.)


In contrast, if you do not reduce in this situation, the resulting
parser may be able to make the claim that it never *shifts nor
reduces* upon encountering an error (see note below). This is a much
stronger claim, at least in terms of error recovery, because it means
you will always be exactly in the correct state with exactly the
correct lookahead information at the point of the error.


The drawback with not reducing in this situation is that it can make
table packing more difficult and less efficient. However, that is
dependent on the precise details of the packing algorithm used.
Moreover, for most grammars of a single programming language, the
tables are simply not large enough to warrant packing at all.


> Situation 2:
> For example, a DFA state contains the following LR(1) items
> A -> alpha X . , lookahead = 'a'
> B -> beta X . , lookahead = 'b'
> What if the lookahead is neither 'a' nor 'b'? Do I assume default
> transitions on A since the rule for A preceeds B?


Again, reducing neither is the "correct" choice for best error
recovery. If you choose to reduce, either choice is equally correct.
You can even choose to reduce To A in some portions of the table and B
in other portions without affecting the correctness of the generated
tables. If neither an "a" or a "b" is the input, if you reduce, you
will still never shift the token that caused the reduce, you will at
most loose the correct lookahead information by being in a different
state.


Note (from above): If your parser generator implements LALR tables,
i.e. states get merged, the resulting parser can still prematurely
reduce. This can happen when one path to the reducing state allows
some token as lookahead but another path to the reducing state does
not allow that token to be lookahead. In that situation, the token
can cause a reduction but the token may not be legal in the state
after the reduction has occurred (i.e. we may have come to the
reduction state via the path that does not allow the token in the
lookahead).


Therefore, if you have an "expected" token that is legal in a state,
but the action for that token is a reduction, you must follow the
reduction to the "goto" state and determine if the token is legal
after the reduction has occured before printing the token as a valid
"expected" token.


Drats, having written about how LALR state merging can cause problems
with spurious reductions, it seems like the separate trailing state
indicator that does not track reductions but only states reached by
shifting is a better solution. That state indicator would not get
confused by the spurious LALR merge induced reductions and leave you
in the initial pre-reduce state. Then, you can safely apply the
"expected" token rule that follows the reductions to determine which
tokens are actually expected.


BTW, the merging problem is not an issue for full-LR parser
generators. In addition, the LALR lookahead merging problem can be
detected (and fixed) at parser generation time.


I hope this response is not too confusing. This is actually part of
the problem with parser generation. The basic mechanics are simple
and thus most people can quickly get a parser generator written.
However, at the detail level, many decisions have implications that
most parser generator authors never even consider.


Hope this helps,
-Chris


*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
3 Proctor Street voice : (508) 435-5016
Hopkinton, MA 01748 USA fax : (508) 435-4847 (24 hours)


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.