Related articles |
---|
"Near Miss" error handling? gwyn@thislove.dyndns.org (2001-03-27) |
Re: "Near Miss" error handling? eeide@cs.utah.edu (Eric Eide) (2001-03-31) |
Re: "Near Miss" error handling? joachim_d@gmx.de (Joachim Durchholz) (2001-04-04) |
[Followup] Re: "Near Miss" error handling? gwyn@thislove.dyndns.org (2001-04-10) |
Re: [Followup] Re: "Near Miss" error handling? Scott.Daniels@Acm.Org (Scottie) (2001-04-12) |
Re: [Followup] Re: "Near Miss" error handling? gwyn@thislove.dyndns.org (2001-04-14) |
Re: [Followup] Re: "Near Miss" error handling? rog@vitanuova.com (2001-04-14) |
From: | gwyn@thislove.dyndns.org (Gwyn Judd) |
Newsgroups: | comp.compilers |
Date: | 10 Apr 2001 01:16:54 -0400 |
Organization: | World Center for Flux Capacitor Research |
References: | 01-03-135 01-03-165 01-04-011 |
Keywords: | errors |
Posted-Date: | 10 Apr 2001 01:16:54 EDT |
This is a followup to all the people who have responded to my query.
It's all been tremendously helpful. I came up with a slightly new use
for the whole idea today. I'm using a method of error recovery I
borrowed from the book "Recursive descent Compiling" by A.J.T Davie and
R. Morrison (the book was in Algol but my compiler is in Visual Basic.
I'm not sure if it is an improvement :) Actually I never wrote anything
in Algol so the first thing I had to do was to teach myself to read it).
They suggest a method whereby when you come across an error, you print
out an error and then set a global variable called "recovering". When
recovering is set, no error messages can be printed out. Additionally,
the next time you get to a position where the input symbol has to be a
certain string, you skip input until the input symbol is that string.
This has the effect of initially assuming the thing that caused the
error was simply misspelled and continuing with the parse as such. When
you get to the next "known token" you have either recovered from the
error, or you skip input until the symbols match at which point you
unset recovering. I find this method works decently in the case where
the token was simply misspelled ("clas" instead of "class"). This is by way
of introduction, not the new bit.
If you imagine that we are at a point in the grammar where there are
several alternatives:
instruction: "if" "(" boolean ")" block
| "while" "(" boolean ")" block
| "loop" "(" number ")" block
Suppose the input as this point is "i(some_boolean_function())". This is
probably a misspelled "if" statement. The near-miss routine can take the
next token, as well as a list of possibles, and tell us which one it is
likely to be. Additionally, if none of them are "close enough", it
returns none of them so we don't get too confused.
But we can do more. Imagine our grammar is augmented with some syntax
for calling functions:
instruction: "if" "(" boolean ")" block
| "while" "(" boolean ")" block
| "loop" "(" number ")" block
| identifier "(" ")" ";"
| identifier "." identifier "(" ")" ";"
Where an identifier is defined as being basically anything like a C/C++
identifier. Then, in the example above, the token "i" looks like a
function name. What I do here is look ahead a couple of symbols. If the
one following the "(" is not a ")", then I know it isn't a function call
(this obviously wouldn't work too well for a more C-like language,
fortunately in this language you can't provide arguments to functions).
So if we look at our string "i(some_boolean_function())", realising it
isn't a function call, we call the near-miss function with a list of
possibles (if, while, loop). It tells us that it is probably an "if".
Since the parser at this point in the grammar is a set of chained if
statements (probably should be a select case (switch)), we then jump
backwards from the error handler (using "goto" of course :)) into the
grammar, just after where the "if" token is, after consuming the token
from the input, of course.
That is, the instruction routine would be something like (psuedocode):
if have "if" then
do_the_if:
do_if
elseif have "while" then
do_the_while:
do_while
elseif have "loop" then
do_the_loop:
do_loop
elseif have identifier then
if token = "." then
function_call
elseif token = "(" then
next = peek_ahead
if next = ")" then
function_call
else
' puts us into recovery mode
syntax "expected instruction but got " token " instead
probable = near_miss(token, "if", "while", "loop")
' consume
next_token
select case probable
case "if"
goto do_the_if
case "while"
goto do_the_while
case "loop"
goto do_the_loop
end select
end if
end if
end if
If we find that the token matches none of the possibles, the token is
simply consumed, as though the grammar matched at that point, and the
function returns without taking any other action. This can help with
extraneous words in the input.
I find the compiler used to get into situations where a misspelled word
would throw the compiler into a state where recovery was completely
impossible. It might try to parse a class declaration, thinking it was
the definition of a function. The end result was that many, many
spurious errors were ejected as the compiler went in and out of
recovery. By using this method, I find that quite often we can pick with
better accuracy the correct spot to jump back into the parse tree.
This all lacks a decent amount of testing as I wrote it all just today.
I have no idea if any of this makes any sense as it is very late here.
I have a feeling that this method can be very helpful, if used sparingly
and cautiously. I'm not sure if it would be easy to use this sort of
method with a table driven parser.
--
Gwyn Judd
Let your conscience be your guide.
-Pope
Return to the
comp.compilers page.
Search the
comp.compilers archives again.