Related articles |
---|
"error handling and recovery" in compilers. rajaram@acmet.com (RERA) (2003-12-27) |
Re: "error handling and recovery" in compilers. nmm1@cus.cam.ac.uk (2004-01-02) |
Re: "error handling and recovery" in compilers. i.dittmer@fh-osnabrueck.de (Ingo Dittmer) (2004-01-02) |
Re: "error handling and recovery" in compilers. cfc@shell01.TheWorld.com (Chris F Clark) (2004-01-07) |
Re: "error handling and recovery" in compilers. i.dittmer@fh-osnabrueck.de (Ingo Dittmer) (2004-01-18) |
From: | Chris F Clark <cfc@shell01.TheWorld.com> |
Newsgroups: | comp.compilers |
Date: | 7 Jan 2004 01:01:05 -0500 |
Organization: | The World Public Access UNIX, Brookline, MA |
References: | 03-12-144 04-01-016 |
Keywords: | errors |
Posted-Date: | 07 Jan 2004 01:01:05 EST |
Nick Maclaren wrote:
> One indication of a well-engineered software product is that most of
> its errors are detected by the product, which produces a message like
> "Compiler error detected; evidence in .weeble; please contact authors."
This is an interesting aside worth commenting on. I guess I must be an
old-time author and so must most of my associates be, since we tend to
do this in software we write ";-)".
We call this code "internal consistency checking code". The idea
being that each relevant piece of code checks to make certain that all
of its assumptions are consistent with the actual data that is being
manipulated. If it isn't consistent, the idea is to print a message
as early as possible to hopefully catch the problem as close to the
cause as possible.
One case where we have such code is in the library we ship with
Yacc++. Therein, the code is actually "problematic" for some users of
our library. It is problematic for several reasons.
1) Some users want to build "fail-proof" applications. These
applications are typically servers that need to stay up 24x7. Such
servers are never supposed to terminate, as doing so might cause
critical transactions to be lost. In some case, the server has no
way to report an internal error; for example, the server has no
"i/o" connection except to the clients, who couldn't understand the
error nor fix it even if they could understand. In many of these
cases the required C language calls, e.g. abort(), are not even
available to the application and having calls to them in the code,
simply causes the server application not to link.
This must be balanced against some of the internal consistency
checks we make, say we called a memory allocation routine,
e.g. new, and got back a reply that indicated that the system had
no memory to fulfill the request. In that case, there is no place
to store the information that we are attempting to save, and the
application cannot continue reliably, it must lose something. A
worse situation, occurs when we are looking at our internal
read-only tables, and find an entry that is outside the valid
values, meaning that someone has written on the read-only tables
(as the tables are initialized only with valid values). Again, the
application cannot reliably continue. In both cases, since the
system cannot reliably continue, it needs to do something. What
our library does is call the error reporting routine with an
indication that an internal error has occured (and, of course,
exactly which internal error, so that the developer knows what to
fix).
Now, in the normal case, the application is running in single-user
mode, and the application can simply report the error to the user
running the program, who can then take an appropriate action, such
as correct the input data and re-execute the program.
However, in the client-server case where the required underlying
libraries are missing, the error reporting scheme has no way to
report the error to the appropriate operator, nor does it have the
ability to halt the server. I wish I had some positive feeling
that the resulting server application builders did some correctness
proofs to validate that the problems could never occur. However,
knowing the complexity of some of their applications, I know that
this is at most a vain wish. Moreover, I cannot hold myself blame
free in this regard either.
This brings us to problem 2.
2) Adding internal consistency checks to our library makes it more
complicated. A complicated library is not only harder to maintain,
at some level it is harder to use. We have had several users that
have not taken full advantage of our library simply because it was
"too complicated" and that complexity was due to the fact that it
has an internal error reporting scheme and uses that in its
internal consistency checks.
Worse, this problem is self-reinforcing. Once the complexity of a
library approaches a certain level, it has a tendancy to become
more complicated as one attempts to simplify it. The error
reporting scheme in our library is an example of that. It is
desigend to be flexible and also to be "user replacable". That is,
if an application designer has their own error reporting scheme,
the library is designed to allow the user to use the user's scheme
in place of its own. The error reporting shceme also has several
parts that are also designed to be tailored or replaced. All of
that is abstractly a good thing, and in most cases, it is also
concretely a good thing, as we can help an application designer
replace only the parts of the error reporting scheme that they need
to replace, tailor the pats the need to tailor, etc. However, it
does add bulk to our library, and does contribute to the overall
complexity--and that is not good.
The 3rd and final problem is not directly related to either of the
above, except in the sense that the complexity of the library makes it
opaque.
3) The internal error consistency checks sometimes make users think
that the problems are in the library (or are in the wrong part of
the library) when the problems are elsewhere and the only fault of
the library is that the internal consistency check detected the
error there.
A typical problem is when users make mistakes in their own "action
code" (code attached to a grammar) and that code overwrites the
parser stack or writes out-of-bounds. The parser will often detect
that error a little bit later when it discovers that the stack is
not in a consistent state. However, the key to finding the problem
is discovering when the state was made inconsistent--and the code
which is reporting the error is almost always not at fault.
BTW, years ago, I remember someone telling me that a good solution to
"internal errors" involves reporting the error only if no other user
error has appeared before. However, if user induced errors were
detected before (and may have caused the internal error), simply to
print a message saying that "complications due to previous errors
prevent the compiler from continuing, fix the errors above."
My experience suggests that terminating the application at the first
internal error is not always the correct solution also. In some
cases, after reporting an internal error, the code which detected the
error needs to gracefully do-nothing and return to the caller (when
possible indicating to the caller that there was a problem and that it
should gracefully exit also).
Hope this helps,
-Chris
*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
19 Bronte Way #33M voice : (508) 435-5016
Marlboro, MA 01752 USA fax : (508) 251-2347 (24 hours)
Return to the
comp.compilers page.
Search the
comp.compilers archives again.