SUMMARY: Static analyzers for detecting programming errors

cpdas@groper.jcu.edu.au (David Spuler)Thu, 5 Dec 91 20:04:26 +1100

From comp.compilers

Related articles
static analyzers for detecting program errors cpdas@marlin.jcu.edu.au (1991-11-21)
SUMMARY: Static analyzers for detecting programming errors cpdas@groper.jcu.edu.au (1991-12-05)
| List of all articles for this month |

 Newsgroups: comp.compilers From: cpdas@groper.jcu.edu.au (David Spuler) Keywords: errors, analysis, summary, bibliography Organization: Compilers Central References: 91-11-087 Date: Thu, 5 Dec 91 20:04:26 +1100

I received quite a large number of replies to my request for information

1) my work/ideas and

Hope it's useful.

David Spuler

MY OWN WORK
===========

Here is a summary of my own bug lists and references on the use of static
analyzers for detecting program errors. Myself, I implemented a full C
checker (better than Lint) as my 4th year Honours project - hence there is an
Honours thesis. Naturally, the bug lists and references have a C bias.

C Bugs
======

Expression Errors
-----------------
assignment instead of equality (if(x=y) versus if(x==y) )
confusing &, | and &&,|| operators
null effect statements (x<<1; instead of x<<=1)
order of evaluation errors (a[i++]=i)
side effects to 2nd/3rd operand of ||, && and ?:

Lexical Errors
==============
Multibyte character constants (novices use 'abc' instead of "abc")

Flow of Control Errors
======================
unreachable statements
use of uninitialized local variables
function return anomalies - return statements with and without expression
function has no return statement on some paths
constants in conditional tests (if(1>0)...)
accidental empty loops (bad semicolon) for(...); { }
missing break in switch statement

Library Usage Errors
====================
bad formats (and argument type) to printf/scanf/etc

Preprocessor Errors
===================
missing braces #define swap(i,j) temp =i; i=j; j=temp;
precedence - need brackets around whole expresion
precedence - need brackets around macro formal parameters
side effects in macro arguments

Declarations
=============
unused local variables
inner scope redeclarations hiding outer name
global problems - inconsistent use of fns etc - Lint does all this stuff

REFERENCES (papers from my Thesis bibliography)
(some aren't highly relevant to static checking, but the titles tell all)
=============================================================================
Spuler, D.A. "Check: A Better Checker for C", Honours Thesis, Dept of Computer
Science, James Cook University of North Queensland, 4811. AUSTRALIA

"Validation, Verification, and Testing of Computer Software",
ACM Computing Surveys, Vol. 14, No. 2, pp. 159-192.

R. E. Berry and B. A. E. Meekings (1985),
"A Style Analysis of C Programs",
Communications of the ACM, Vol. 28, No. 1, pp. 80-88.

R. E. Fairley (1978),
"Static Analysis and Dynamic Testing of Computer Software",
IEEE Computer, Vol. 11, No. 4, pp. 14-23.

L. D. Fosdick and L. J. Osterweil (1976),
"Data Flow Analysis In Software Reliability",
ACM Computing Surveys, Vol. 8, No. 3, pp. 305-330.

M. T. Harandi and J. Q. Ninq (1990), "Knowledge-Based Program Analysis",
IEEE Software, January 1990, pp. 74-81.

W. Harrison (1977),
"Compiler Analysis of Value Ranges for Variables",
IEEE Transactions on Software Engineering, Vol. 3, No. 3, pp. 243-250.

W. S. Hecht and J. D. Ullman (1975),
"A Simple Algorithm for Global Data Flow Analysis Problems",
SIAM Journal of Computing, Vol. 4, pp. 528-532.

T. R. Hopkins (1980), "PBASIC - A Verifier for BASIC",
Software Practice and Experience, Vol. 10, No. 3, pp. 175-181.

W. E. Howden (1982), "Validation of Scientific Programs",
ACM Computing Surveys, Vol. 14, No. 2, pp. 193-227.

W. E. Howden (1990), "Comments Analysis and Programming Errors",
IEEE Transactions on Software Engineering, Vol. 16, No. 1, pp. 72-81.

S. C. Johnson (1978), "Lint, a C Program Checker", Bell Laboratories,
Murray Hill, NJ, Computer Science Technical Report, July 1978 (also
appearing as part of the documentation for many versions/variants of the
UNIX operating system, as in: Ultrix Programmers Manual - Supplementary
Documents, 1983).

W. L. Johnson (1986), Intention-Based Diagnosis of Novice Programming
Errors, Pitman.

J. Katzenelson and A. Strominger (1987), "Debugging Programs that use
Macro-Oriented Data Abstractions", Software Practice and Experience, Vol.
17, No. 2, pp. 79-103.

J. C. King (1976), "Symbolic Execution and Program Testing",
Communications of the ACM, Vol. 19, No. 7, pp. 385-394.

D. E. Knuth (1971), "An Empirical Study of FORTRAN Programs",
Software Practice and Experience, Vol. 1, pp 105-133.

S. Lauesen (1979), "Debugging Techniques", Software Practice and Experience,
Vol. 9, pp. 51-63.

T. E. Lindquist and J. R. Jenkins (1988), "Test-Case Generation with IOGen",
IEEE Software, January 1988, pp. 72-79.

P. G. Moulton and M. E. Muller (1967), "DITRAN - A Compiler Emphasizing
Diagnostics", Communications of the ACM, Vol. 10, No. 1, pp. 45-52.

S. S. Muchnick and N. D. Jones (1981),
Program Flow Analysis: Theory and Applications, Prentice-Hall.

L. J. Osterweil and L. D. Fosdick (1976), "DAVE - A Validation, Error
Detection and Documentation System for Fortran Programs", Software
Practice and Experience, Vol. 6, No. 4, pp. 473-486.

K. A. Redish and W. F. Smyth (1986), "Program Style Analysis: A Natural
By-Product of Program Compilation", Communications of the ACM, Vol. 29,
No. 2, pp. 126-133.

B. G. Ryder (1974), "The PFORT Verifier",
Software Practice and Experience, Vol. 4, No. 4, pp. 359-377.

R. E. Seviora (1987), "Knowledge-Based Program Debugging Systems", IEEE
Software, Vol. 4, No. 3, pp. 20-32.

N. Suzuki and K. Ishihata (1977), "Implementation of an Array Bound
Checker", Fourth ACM Symposium on Principles of Programming Languages, pp.
132-143.

C. Wilson and L. J. Osterweil (1985),
"Omega - A Data Flow Analysis Tool for the C Programming Language",
IEEE Transactions on Software Engineering, Vol. 11, No. 9, pp. 832-838.

============================
RESPONSES FROM OTHERS
============================

>From @yonge.csri.toronto.edu:hugh@redvax Thu Dec 5 15:06:51 1991

I too am interested in static analysis tools. I have produced a C
compiler (intended to be commercial) that does extensive lint-like
checking.

Over a decade ago Fosdick and Osterweil (sp?) at Colorado produced a
program to detect "data-flow anomalies" in FORTRAN programs. I thought
that work was very interesting. At the time, I prototyped a similar
system (theirs was so slow that they recommended it be used only once in
the life of a program; mine was fast enough that it would not have

Do you know of the PFORT Verifier? I think that it is available for free
from Bell Labs. I think that it is sort of a Lint for FORTRAN,
emphasizing checking for FORTRAN 77 standard conformance.

Again, a decade ago, there was some work at eliminating runtime range
checks from Pascal programs. Clearly, this could be turned around into
compile-time warnings, perhaps without being annoying. I think Welsh of
Queens University of Belfast wrote one of the better papers (dim
recollection).

Anyway, I would be interested in your bug lists and references.

Hugh Redelmeier
{utcsri, yunexus, uunet!attcan, utzoo, scocan}!redvax!hugh
When all else fails: hugh@csri.toronto.edu
+1 416 482-8253

===================================================

>From jackson@larch.lcs.mit.edu Tue Nov 26 03:23:07 1991

David,

Raymie Stata, a fellow graduate in my research group passed me your
request from the net.

I am completing a thesis on a bug detection scheme I have invented called
Aspect. I am attaching some blurb from a paper I have just written. I'm
including the bibliography too, which may give you some useful references.

If you're interested, I'd be happy to tell you more. There is one paper
published that describes the state of Aspect about 1 year ago; it's
Aspect: an Economical Bug Detector', International Conf. On Software
Engineering, May 1991.

I'd also be very interested in any references or ideas that you have.

Regards,

--Daniel Jackson

Aspect is an annotation language for detecting bugs in imperative
programs. The programmer annotates a procedure with simple assertions
that relate abstract components (called aspects') of the pre- and
post-states. A checker has been implemented that can determine
efficiently whether the code satisfies an assertion. If it does not,
there is a bug in the code (or the assertion is wrong) and an error
message is displayed. Although not all bugs can be detected, no
spurious bugs are reported.

...

The purpose of a compiler is not just to make it easier to write good
programs but also to make it harder to write bad ones. Catching errors
during compilation saves testing. It also spares the greater cost of
discovering the error later when it is harder to fix.

Programming errors can be divided into two classes. {\em Anomalies} are
flaws that are apparent even to someone who has no idea what the
program is supposed to do: uninitialized variables, dead code,
infinite loops, etc. {\em Bugs}, on the other hand, are faults only with
respect to some intent. An anomaly detector can at best determine
that a program does something right; a bug detector is needed to tell
whether it does the right thing.

Aspect detects bugs with a novel kind of dataflow annotation.
Annotating the code is extra work for the programmer, but
it is mitigated by two factors. First, some sort of redundancy is
inevitable if bugs, rather than just anomalies, are to be caught.
Moreover, Aspect assertions may be useful documentation: they are
generally much shorter and more abstract than the code they accompany.
Second, no mimimal annotation is demanded; the programmer can choose
to annotate more or less according to the complexity of the code or
the importance of checking it.

A procedure's annotation relates abstract components of objects called
aspects'. The division of an object into aspects is a kind of data
abstraction; the aspects are not fixed by the object's representation
but are chosen by the programmer.

Each assertion of the annotation states that an aspect of the
post-state is obtained from some aspects of the pre-state. The
checker examines the code to see if such dependencies are plausible.
If there is no path in the code that could give the required
dependencies, there must be an error: the result aspect was computed
without adequate information. An error message is generated saying
which abstract dependency is missing.

...

Many compilers, of course, perform some kind of anomaly analysis, and
a variety of clever techniques have been invented (see, e.g.
\cite{carre}). Anomaly detection has the great advantage that it
comes free to the programmer. Aspect might enhance existing methods
with a more precise analysis that would catch more anomalies (using
annotations of the built-in procedures alone). But there will always
be a fundamental limitation: most errors are bugs and not anomalies.

The Cesar/Cecil system \cite{cesar}, like Aspect, uses annotations to
detect bugs. Its assertions are path expressions that constrain the
order of operations. Errors like failing to open a file before
reading it can be detected in this way. Flavor analysis \cite{flavor}
is a related technique whose assertions make claims about how an
object is used: that an integer is a sum in one place in the code and
a mean in another, for instance. Both techniques, however, report
spurious bugs in some cases: an error may be signalled for a path that
cannot occur. Aspect, on the other hand, is sound: if an error is
reported, there is a bug (or the assertion is wrong).

Type checking may also be viewed as bug detection when there is name
equality or data abstraction. Aspect is more powerful for two
reasons. First, since procedures with the same type signature usually
have different annotations, Aspect can often tell that the wrong
procedure has been called even when there is no type mismatch.
Second, type systems are usually immune to changes of state and so, in
particular, cannot catch errors of omission. Even models that
classify side-effects, such as FX \cite{FX}, do not constrain the
order of operations like Aspect.

The version of Aspect described here advances previous work
\cite{icse} by incorporating alias analysis. It can handle multi-level
pointers, whose precise analysis is known to be intractable
\cite{Landi}. The alias scheme adopted is most similar to
\cite{larus}, but it is less precise and cannot handle cyclic and
recursive structures.

...

\bibitem[BC85]{carre}
Jean-Francois Bergeretti \& Bernard A. Carre,
Information-Flow and Data-Flow Analysis of while-Programs',
ACM Trans. Programming Languages and Systems, Vol. 7, No. 1, January 1985.

\bibitem[CWZ90]{chase}
David R. Chase, Mark Wegman \& F. Kenneth Zadeck,
Analysis of Pointers and Structures',
Proc. SIGPLAN '88 Conf. Prog. Lang. Design \& Implementation, 1990.

\bibitem[GL88]{FX}
David K. Gifford \& John M. Lucassen,
Polymorphic Effect Systems',
ACM Conf. Principles of Programming Languages, 1988.

\bibitem[How89]{flavor}
William E. Howden,
Validating Programs Without Specifications',
Proc. 3d ACM Symposium on Software Testing, Analysis and Verification (TAV3),
Key West, Florida, Dec. 1989.

\bibitem[Jac91]{icse}
Daniel Jackson,
Aspect: An Economical Bug-Detector',
Proc 13th Int. Conf. on Software Engineering,
Austin, Texas, May 1991.

\bibitem[Lan91]{Landi}
William Landi and Barbara G. Ryder,
Pointer-induced Aliasing: A Problem Classification',
ACM Conf. Principles of Programming Languages, 1991.

\bibitem[LH88]{larus}
James R. Larus and Paul N. Hilfinger,
Detecting Conflicts Between Structure Accesses'
ACM Conf. Programming Language Design and Implementation,
June 1988.

\bibitem[OO89]{cesar}
Kurt M. Olender \& Leon J. Osterweil,
`Cesar: A Static Sequencing Constraint Analyzer',
Proc. 3d ACM Symposium on Software Testing, Analysis and Verification (TAV3),
Key West, Florida, Dec. 1989.

===================================================

>From adv5@shakti.ernet.in Tue Nov 26 03:17:02 1991

I am a member of a quality control team in Citicorp Overseas Software Ltd.

I do a lot of desk work, to test code validity.

Till now I have used manual methods for static analysis of code, using
tables of states of variables, and basically sweating it out. I wish to
know if you have more information on known bugs or pitfalls in various
constructs of a language.

I will dig out some information on static analysers, and mail them to you

===================================================
>From @bay.csri.toronto.edu:norvell@csri.toronto.edu Tue Nov 26 02:04:23 1991

I'd be very interested in your list of references. Unfortunately I can't
find my own list of references to give you in return. Perhaps I didn't
type it in yet. I _can_ send you an unpublished paper (complete except
for bibliography) on detecting dataflow anomalies in procedural languages.
There is also an experimental program that implements the ideas of the
paper. The paper can be sent in postscript or dvi and the program in in
Turing -- a Pascal type language.

Theo Norvell norvell@csri.toronto.edu
U of Toronto norvell@csri.utoronto.ca

===================================================
>From ae2@cunixa.cc.columbia.edu Sun Nov 24 21:40:04 1991

Greetings!

interest in this particular field as my final project has to do with the
implemetation of expetional handlin in "Small C", a compiler designed on
the IBM 8088 and it's sole interest is educational, something equivalent
to minix. I would greatly appreciate if you could help me in finding
sources that dwell on this subject, anything that would be related to
errors and how one might deal with them would be relavant.
--Amiran
ae2@cunixa.cc.columbia.edu
===================================================

>From paco@cs.rice.edu Sun Nov 24 04:42:03 1991

The Convex Application Compiler (TM?) apparently does a pretty good job.
Any interprocedural analyzer has to catch a lot of errors just to avoid
crashing. They also do pointer tracking, array section analysis, and
generally just all-out analysis and optimization. Bob Metzger
<metzger@convex.com> et al. have a paper on the pointer tracking in the
proceedings of Supercomputing '91, which was held last week.

It is alleged that Convex has sold machines, or at least copies of their
compiler, just for use in static error checking.

Paul Havlak

===================================================
>From jvitek@csr.UVic.CA Sat Nov 23 11:54:49 1991

You might be interested in looking at abstract interpretation. It is a
technique related to data-flow analysis (you can express data-flow anlyses
as abstract interpreation problems) and is quite popular in among the
functional programming crowd.

There is a number of paper (even books!) on the subject, if you are
interested I can provide references.

===================================================

>From twinsun!twinsun.com!eggert@CS.UCLA.EDU Fri Nov 22 07:17:14 1991

Here's a few references that I've written:

Paul Eggert,
Toward special-purpose program verification,
.i "Proc. ACM SIGSOFT International Workshop on Formal Methods
in Software Development",
Napa, CA (9-11 May 1990);
.i "Software engineering notes"
.b 15 ,
4 (September 1990), 25-29.

Paul Eggert,
An Example of Detecting Software Errors Before Execution,
.i "Proc. workshop on effectiveness of testing and proving methods,"
Avalon, CA
(11-13 May 1982), 1-8.

Paul Eggert,
Detecting Software Errors Before Execution,
UCLA Computer Science Department report CSD-810402 (April 1981).

The last reference contains an extensive survey of the pre-1980 literature.
I'd be interested in seeing your list of references as well,
when you have the time.

===================================================
>From beck@cs.cornell.edu Fri Nov 22 05:31:04 1991

I'm interested in hearing your list of bugs. I have given some thought to
the detection and propagation of error information in a program using
dependence information. Propagation of errors uses the idea that any
computation on a path which leads only to an error condition is dead
unless a side-effect intervenes, and any code after the error is
unreachable. Thus one can actually integrate program errors into control
flow information as an unstructured jump to the end of the program. In an
optimizing compiler, one might be tempted to say that any statement which
can be scheduled after the last side effect before the error is dead.
Thus, in this fragment:

1: x := 2
2: print "hi there"
3: y := z/0

statement 1 is actually dead, but statement 2 is not.

Micah Beck
--

Post a followup to this message