Re: Semantic Checking - C

torbenm@diku.dk (=?iso-8859-1?q?Torben_=C6gidius_Mogensen?=)
3 Feb 2005 22:41:01 -0500

          From comp.compilers

Related articles
Semantic Checking - C johnvoltaire@gmail.com (johnvoltaire) (2005-01-30)
Re: Semantic Checking - C nmm1@cus.cam.ac.uk (2005-02-03)
Re: Semantic Checking - C torbenm@diku.dk (2005-02-03)
Re: Semantic Checking - C jeremy.wright@microfocus.com (Jeremy Wright) (2005-02-03)
Re: Semantic Checking - C jacob@jacob.remcomp.fr (jacob navia) (2005-02-03)
Re: Semantic Checking - C neal.wang@gmail.com (Neal Wang) (2005-02-03)
Re: Semantic Checking - C foobar@nowhere.void (Tommy Thorn) (2005-02-03)
Re: Semantic Checking - C hannah@schlund.de (2005-02-11)
| List of all articles for this month |

From: torbenm@diku.dk (=?iso-8859-1?q?Torben_=C6gidius_Mogensen?=)
Newsgroups: comp.compilers
Date: 3 Feb 2005 22:41:01 -0500
Organization: Department of Computer Science, University of Copenhagen
References: 05-01-098
Keywords: C, debug, semantics
Posted-Date: 03 Feb 2005 22:41:01 EST

"johnvoltaire" <johnvoltaire@gmail.com> writes:




> 1. We are trying to improve the semantic analysis of a particular
> compiler, and we need to identify the errors that compilers usually
> can or cannot detect. Please take time to review the following list of
> errors we've gathered and check if any or all of them belongs to the
> semantic checking routine during compilation/runtime. If you know a
> specific semantic error that we missed, we'll appreciate it if you
> could add the details.
>
> 2. The process we are planning to do is to make a static semantic
> checking of C programs so that these kind of semantic errors would not
> occur upon execution the program. This particular process is somewhat
> like we can call as a diagnostic procedure that will walkthrough the
> source code (without the necessity of running it) and see if any of
> the statements will violate any semantic rules.
>
> From what we've understand, static semantic checking refers to the
> analysis of expected program meaning or flow before
> compilation/execution, while dynamic semantic checking refers to the
> analysis during execution. Could anyone affirm on these?


More or less. Static checking does not have to be before compilation
(as it can be done on the compiled code), but it is certainly before
execution. Dynamic checking is, indeed, at runtime.


Note that static checking can never be precise, as it is not
computable which parts of a program will ever be executed (so even
dead code analysis is approximate) or what variables a variable can
have. So you must decide wheter you want to err on the side of safety
(if the analysis is unsure, issue a warning or error message) or only
report errors that are certain to occur. A compromise may be to
report errors that will occur if the relevant bit of code is ever
executed.


> Common semantic errors in C language:
>
> 1. Use of function without function prototypes


It is statically checkable given the assumption that the code is
reachable.


> 2. Code with no effect (dead code)


There are two types of dead code: Unreachable code and code with no
visible net effect. Unreachable code can be approximated such that
you will find some definitely unreachable code, but you can not find
all unreachable code. Code with no visible effect can also be found
statically. Typically, it reduces to assignments to dead variables.
You must be careful that you don't remove assignments such as x=a[i],
as even if x isn't used, the lookup might trigger an error. And
silently removing erroneous code is bad, as the same program may later
fail if compiled with a compiler that does not remove this code.


> 3. Division by zero


You will rarely be able to find this statically. You may be able to
issue a warning that the division _may_ be by zero, but you will get
many false positives. But a static analysis might find cases where it
can see that the divisor is non-zero, so a dynamic check can be
removed. But with C, you might not want dynamic checks.


> 4. Use of functions and variables which are defined but not used


Also statically checkable, though again, you can not be precise.


> 5. Use of functions and variables with defined arguments that are never
> used


Same as above.


> 6. Use of functions and variables that return either with or without
> any assigned value


Many compilers warn about this. It reduces to checking if there is a
return statement on every path from the entry point to the exit point.


> 7. Use of functions and variables that return values that are never
> used


It is visible from the type of a function whether it returns a value,
so this is easy. However, it is common in C to ignore returned values
from functions. Many standard functions return values that are
ignored more often than not.


> 8. Subscript out of bounds


As with division by zero, you can find cases where you can see that
this isn't possible and so avoid a dynamic check. However, there will
be many cases where no out-of-bounds can occur but the static analysis
will fail to recognize this, so writing error messages for unsure
cases will generate a lot of false positives. The problem with many
false positives is that programmers will tend to ignore these
warnings, even when they are not false positives (a case of "the boy
who cried wolf").


> 9. Booleans that always evaluate true or evaluate false


According to the C specification, they always will, as any integer has
an interpretation as a boolean (0 == false, nonzero == true). You can
analyse whether you get only 0 or 1 as results, but you will find that
a lot of programmers use "if (x)" to mean "if (x!=0)".


> 10. Checking of infinite loops as well as loops that cannot be exited
> or entered


The first part is the halting problem and, hence undecidable. Also,
some programs deliberately have infinite loops that are meant to be
broken by interrupts. The second part is reachability similar to the
problems of unreachable code and whether there is a return on all
paths to the exit point of a function.


> 11. Statements that cannot be reached during execution


See above.


> 12. No identifiers or variables are used twice in the same block or
> scope


I assume you mean "declared" rather than "used". This is easily
checked.


> 13. The number and types of arguments in a function call must be the
> same as the number and types of the prototypes


A type issue, agian easily checked. Beware about varargs, though.


> 14. A return statement must not have a return value unless it appears
> in the function prototype that is declared to return a value


Also a type issue.


> 15. Break statements appear outside enclosing constructs where a break
> statement may appear


Easy.


> 16. Elements of enumerated types are repeated


Ditto.


> 17. Variable names appear in the same lexical scope


Ditto.


> 18. Labels are repeated


Ditto.


Most compilers will warn about a lot of the above, in particular the
last few cases.


                Torben


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.