Re: recovery from syntax errors, was "ignorant newbie" question

Jerry Leichter <leichter@smarts.com>
22 Feb 1997 23:03:08 -0500

          From comp.compilers

Related articles
Re: [QUERY] A "ignorant newbie" question about compiler-writing. kanze@gabi-soft.fr (J. Kanze) (1997-01-30)
Re: [QUERY] A "ignorant newbie" question about compiler-writing. mff@research.att.com (Mary Fernandez) (1997-02-11)
Re: [QUERY] A "ignorant newbie" question about compiler-writing. dennis@netcom.com (1997-02-16)
Re: [QUERY] A "ignorant newbie" question about compiler-writing. nr@adder.cs.virginia.edu (Norman Ramsey) (1997-02-20)
Re: recovery from syntax errors, was "ignorant newbie" question leichter@smarts.com (Jerry Leichter) (1997-02-22)
Re: recovery from syntax errors, was "ignorant newbie" question bear@sonic.net (Ray Dillinger) (1997-02-22)
| List of all articles for this month |

From: Jerry Leichter <leichter@smarts.com>
Newsgroups: comp.compilers
Date: 22 Feb 1997 23:03:08 -0500
Organization: System Management ARTS
References: 97-01-258 97-02-081 97-02-090 97-02-107
Keywords: errors, design

| >It seems obvious that you cannot produce a compiler that will always
| >give a correct second error message, because the compiler cannot know
| >what I actually intended in place of the first error.
|
| Um, it may seem obvious, but it's not. The algorithm I described
| ... can produce reliable error messages about other parts of your
| program, which don't depend on the first error.


That depends on a highly specific notion of "error" and "depends on".


Consider a typed language with required declarations. X is declared to
be an integer. It is used n > 1 times in the program. Let's look at
the following n+1 cases:


0. All n uses are in locations where an integer is appropriate.
OK, the program is correct.
1. n-1 uses are in locations where an integer is appropriate;
in the remaining one, it isn't.
...
n. All n uses are in locations where an integer is *not*
appropriate.


What error messages should we expect? Well, in Case 1, we'd like to
see an error message at the one "bad" location. In Case n, it's
"clearly" the *declaration* that's incorrect, and ideally we'd like to
see one error message, not n of them. Where should the transition
between the error message styles occur? It's not at all clear.


After we've seen the first incorrect use, we have two choices: Mark the
variable "tainted", or not. If we mark it "tainted", we won't see a
message for the second incorrect use. If we don't, we don't mark it,
subsequent uses should will also produce messages. Both approaches are
defensible from a theoretical point of view. The first produces the
"right" result for Case n; the second, the "right" result for cases like
Case 2 or 3.


More generally, the algorithm assumes that there is some obvious notion
of "an" error, and further that the *first* (earliest in parse order)
occurance of something that also occurs elsewhere is "correct". What's
really going on in my examples is that we have a number of correlated
occurances, no one of which is necessarily in error - the error is in
their mutual incompatibility. When this occurs, you don't a priori have
any reason to choose one of the definitions as "correct". Any choice is
ultimately arbitrary. If want you want is "theoretical cleanlyness",
any choice is probably as good as any other. If what you are trying to
achieve is the best practical error reporting - there's much more work
to be done.


As another example: Suppose you see what is clearly a declaration for
X, but the declaration is erroneous, so that you can't determine what
type X is supposed to have. The current algorithm would report that
error, then skip over any expression in which X appeared. It would be
just as correct - and probably more useful - to declare X internally as
"declared but type not yet known", then try to infer the type from
subsequent uses. What matters is that the rest of the uses be
consistent with *some* type. Even if you can't determine what this type
is - even if you just ignore the type, in effect declaring X to have a
fictional "universal" type, compatible with everything - you can at
least check the expressions involved for other errors. The same thing
goes if there is no declaration for X at all.


-- Jerry
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.