Newsgroups: | comp.compilers |
From: | Terence Parr <parrt@everest.ee.umn.edu> |
Keywords: | errors, parse, tools |
Organization: | Compilers Central |
References: | 94-09-142 94-09-180 |
Date: | Fri, 30 Sep 1994 21:27:15 GMT |
Ron Guilmette asks:
> This seems to be the current `conventional wisdom' (among a lot of folks I
> know anyway), but is it really true? Where's the irrefutable evidence to
> support this view?
> ... These
> days however, everyone seems to be saying that recursive descent is better
> for error recovery, and that's rather disconcerting to me.
There are two main (polarized) approaches to error recovery as I see it:
[1] Manual. For each error condition you specify how to recover.
Typically, you'd be doing a hand-built parser. However, some folks
at Metaware have a slick LALR based system called TWS that allows
them to try several alternatives before recovering in some "optimal"
way. I presume that this takes a good amount of hand "tweaking" to
get right just like a recursive descent parser.
This tends to be a bit of a chore, but has the potential to
yield very good recovery.
[2] Automatic recovery (such as the single insertion/deletion correction
methods etc...). ANTLR has a simple "panic/glutton" method that
seems to work well for many situations. Most parser generators have
some automatic mechanism for reporting/recovering from errors--some
of them are pretty decent from what I understand.
This is clearly less work for the grammar developer, but doesn't
yield the best results.
People may be saying that RD parsers are better at recovery for several
reasons:
[1] You have lots of context information when you detect an error. For
example, if you die in an expression, you might know that it was the
conditional of an if-statement. In a bottom up parser, you will not
always know the context (you'd probably have to examine the symbol
stack to decide where you were, which amounts to writing a simple
parser by hand).
[2] RD parsers are associated with the "hand-built parsers", although
ANTLR and a few others generate RD parsers. Writing a RD parser by
hand gives you the ultimate freedom (with lots of effort) for
correcting errors since you're just coding in C or whatever.
I've been promising something slick for ANTLR called "parser exception
handling" (PEH) for awhile now; it's in a testing stage as we speak. For
years, I've been trying to dream up ways to give the programmer good control
of error recovery, without lots of hassle. After long discussions with many
folks, the simple (and now obvious) leap from C++ exception handling to
parser exception handling has been made.
My driving force is to provide the power/flexibility of a hand-built parser
with the convenience of an automatic parser generator. Because it would
automate much of the details, a given programming effort would provide
better results using PEH over a hand coded solution.
Let's consider a simple example. Assuming an input of
if 3+* then ...
for some rule:
stat: IF expr THEN stat
| ...
;
we would like to see a nice error message like:
file(line): error at "*": bad conditional of IF
I argue that, while an automatic mechanism might be able to recover from
this error, the error reporting facility could only generate something like
file(line): error at "*" in expression
The first error message requires human context information that an automatic
mechanism would be unable to discover.
ANTLR's parser exception handling mechanism works like this:
stat: IF e:expr THEN stat
exception[e]
catch MismatchedToken :
catch NoViableAlt :
<<
fprintf(stderr,"%s(%d): error at \"%s\": bad conditional of IF\n",
curfile, curline, LATEXT(1));
consumeUntilToken(THEN);
>>
| ...
;
where curfile and curline are some user defined variables; LATEXT(i) is the
text of the ith token of lookahead. The label "e:" on the reference to expr
allows me to catch errors encountered specifically during that reference.
The consumeUntilToken() routine simply scarfs tokens until it sees the
argument. The solution given here is not perfect, because the input may be
missing the THEN as well. This is not a great description, but you get
the idea.
Rules may have default handlers (so that handlers are not needed for each
alternative) and global default handlers may be specified that are included
in each rule handler list. ANTLR allows the use of its automatic mechanism
until the exception handling rules can be put inserted. As we learn more,
this mechanism will evolve to provide better and better support for
programmers.
This notion of parser exception handling might hold promise for bottom-up
parsers as well (my intuition says it would be much harder).
PCCTS version 1.30 (coming soon to a theatre near you) will incorporate
this mechanism.
Regards,
Terence Parr
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.