Re: "error handling and recovery" in compilers.

Chris F Clark <cfc@shell01.TheWorld.com>
7 Jan 2004 01:01:05 -0500

From comp.compilers

Related articles
"error handling and recovery" in compilers. rajaram@acmet.com (RERA) (2003-12-27)
Re: "error handling and recovery" in compilers. nmm1@cus.cam.ac.uk (2004-01-02)
Re: "error handling and recovery" in compilers. i.dittmer@fh-osnabrueck.de (Ingo Dittmer) (2004-01-02)
*Re: "error handling and recovery" in compilers. cfc@shell01.TheWorld.com (Chris F Clark)* (2004-01-07)**
Re: "error handling and recovery" in compilers. i.dittmer@fh-osnabrueck.de (Ingo Dittmer) (2004-01-18)

| List of all articles for this month |

From:	Chris F Clark <cfc@shell01.TheWorld.com>
Newsgroups:	comp.compilers
Date:	7 Jan 2004 01:01:05 -0500
Organization:	The World Public Access UNIX, Brookline, MA
References:	03-12-144 04-01-016
Keywords:	errors
Posted-Date:	07 Jan 2004 01:01:05 EST

Nick Maclaren wrote:
> One indication of a well-engineered software product is that most of
> its errors are detected by the product, which produces a message like
> "Compiler error detected; evidence in .weeble; please contact authors."

This is an interesting aside worth commenting on. I guess I must be an
old-time author and so must most of my associates be, since we tend to
do this in software we write ";-)".

We call this code "internal consistency checking code". The idea
being that each relevant piece of code checks to make certain that all
of its assumptions are consistent with the actual data that is being
manipulated. If it isn't consistent, the idea is to print a message
as early as possible to hopefully catch the problem as close to the
cause as possible.

One case where we have such code is in the library we ship with
Yacc++. Therein, the code is actually "problematic" for some users of
our library. It is problematic for several reasons.

1) Some users want to build "fail-proof" applications. These
      applications are typically servers that need to stay up 24x7. Such
      servers are never supposed to terminate, as doing so might cause
      critical transactions to be lost. In some case, the server has no
      way to report an internal error; for example, the server has no
      "i/o" connection except to the clients, who couldn't understand the
      error nor fix it even if they could understand. In many of these
      cases the required C language calls, e.g. abort(), are not even
      available to the application and having calls to them in the code,
      simply causes the server application not to link.

      This must be balanced against some of the internal consistency
      checks we make, say we called a memory allocation routine,
      e.g. new, and got back a reply that indicated that the system had
      no memory to fulfill the request. In that case, there is no place
      to store the information that we are attempting to save, and the
      application cannot continue reliably, it must lose something. A
      worse situation, occurs when we are looking at our internal
      read-only tables, and find an entry that is outside the valid
      values, meaning that someone has written on the read-only tables
      (as the tables are initialized only with valid values). Again, the
      application cannot reliably continue. In both cases, since the
      system cannot reliably continue, it needs to do something. What
      our library does is call the error reporting routine with an
      indication that an internal error has occured (and, of course,
      exactly which internal error, so that the developer knows what to
      fix).

      Now, in the normal case, the application is running in single-user
      mode, and the application can simply report the error to the user
      running the program, who can then take an appropriate action, such
      as correct the input data and re-execute the program.

      However, in the client-server case where the required underlying
      libraries are missing, the error reporting scheme has no way to
      report the error to the appropriate operator, nor does it have the
      ability to halt the server. I wish I had some positive feeling
      that the resulting server application builders did some correctness
      proofs to validate that the problems could never occur. However,
      knowing the complexity of some of their applications, I know that
      this is at most a vain wish. Moreover, I cannot hold myself blame
      free in this regard either.

This brings us to problem 2.

2) Adding internal consistency checks to our library makes it more
      complicated. A complicated library is not only harder to maintain,
      at some level it is harder to use. We have had several users that
      have not taken full advantage of our library simply because it was
      "too complicated" and that complexity was due to the fact that it
      has an internal error reporting scheme and uses that in its
      internal consistency checks.

      Worse, this problem is self-reinforcing. Once the complexity of a
      library approaches a certain level, it has a tendancy to become
      more complicated as one attempts to simplify it. The error
      reporting scheme in our library is an example of that. It is
      desigend to be flexible and also to be "user replacable". That is,
      if an application designer has their own error reporting scheme,
      the library is designed to allow the user to use the user's scheme
      in place of its own. The error reporting shceme also has several
      parts that are also designed to be tailored or replaced. All of
      that is abstractly a good thing, and in most cases, it is also
      concretely a good thing, as we can help an application designer
      replace only the parts of the error reporting scheme that they need
      to replace, tailor the pats the need to tailor, etc. However, it
      does add bulk to our library, and does contribute to the overall
      complexity--and that is not good.

The 3rd and final problem is not directly related to either of the
above, except in the sense that the complexity of the library makes it
opaque.

3) The internal error consistency checks sometimes make users think
      that the problems are in the library (or are in the wrong part of
      the library) when the problems are elsewhere and the only fault of
      the library is that the internal consistency check detected the
      error there.

      A typical problem is when users make mistakes in their own "action
      code" (code attached to a grammar) and that code overwrites the
      parser stack or writes out-of-bounds. The parser will often detect
      that error a little bit later when it discovers that the stack is
      not in a consistent state. However, the key to finding the problem
      is discovering when the state was made inconsistent--and the code
      which is reporting the error is almost always not at fault.

BTW, years ago, I remember someone telling me that a good solution to
"internal errors" involves reporting the error only if no other user
error has appeared before. However, if user induced errors were
detected before (and may have caused the internal error), simply to
print a message saying that "complications due to previous errors
prevent the compiler from continuing, fix the errors above."

My experience suggests that terminating the application at the first
internal error is not always the correct solution also. In some
cases, after reporting an internal error, the code which detected the
error needs to gracefully do-nothing and return to the caller (when
possible indicating to the caller that there was a problem and that it
should gracefully exit also).

Hope this helps,
-Chris

*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
19 Bronte Way #33M voice : (508) 435-5016
Marlboro, MA 01752 USA fax : (508) 251-2347 (24 hours)

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: "error handling and recovery" in compilers.

Chris F Clark <cfc@shell01.TheWorld.com>7 Jan 2004 01:01:05 -0500

Chris F Clark <cfc@shell01.TheWorld.com>
7 Jan 2004 01:01:05 -0500