Re: Bounds checking, Optimization techniques and undefined behavior

Christopher F Clark <christopher.f.clark@compiler-resources.com>
Mon, 6 May 2019 05:05:52 -0400

          From comp.compilers

Related articles
[13 earlier articles]
Re: Bounds checking, Optimization techniques and undefined behavior bc@freeuk.com (Bart) (2019-05-05)
Re: Bounds checking, Optimization techniques and undefined behavior DrDiettrich1@netscape.net (Hans-Peter Diettrich) (2019-05-05)
Re: Bounds checking, Optimization techniques and undefined behavior gneuner2@comcast.net (George Neuner) (2019-05-05)
Re: Bounds checking, Optimization techniques and undefined behavior gneuner2@comcast.net (George Neuner) (2019-05-05)
Re: Bounds checking, Optimization techniques and undefined behavior anw@cuboid.co.uk (Andy Walker) (2019-05-06)
Re: Bounds checking, Optimization techniques and undefined behavior DrDiettrich1@netscape.net (Hans-Peter Diettrich) (2019-05-06)
Re: Bounds checking, Optimization techniques and undefined behavior christopher.f.clark@compiler-resources.com (Christopher F Clark) (2019-05-06)
Re: Bounds checking, Optimization techniques and undefined behavior bc@freeuk.com (Bart) (2019-05-06)
Re: Bounds checking, Optimization techniques and undefined behavior 0xe2.0x9a.0x9b@gmail.com (Jan Ziak) (2019-05-06)
Re: Bounds checking, Optimization techniques and undefined behavior anw@cuboid.co.uk (Andy Walker) (2019-05-06)
Re: Bounds checking, Optimization techniques and undefined behavior david.brown@hesbynett.no (David Brown) (2019-05-06)
Re: Bounds checking, Optimization techniques and undefined behavior david.brown@hesbynett.no (David Brown) (2019-05-07)
Re: Bounds checking, Optimization techniques and undefined behavior david.brown@hesbynett.no (David Brown) (2019-05-07)
[15 later articles]
| List of all articles for this month |

From: Christopher F Clark <christopher.f.clark@compiler-resources.com>
Newsgroups: comp.compilers
Date: Mon, 6 May 2019 05:05:52 -0400
Organization: Compilers Central
References: 19-04-021 19-04-023 19-04-037 19-04-039 19-04-042 19-04-044 19-04-047 19-05-004 19-05-006 19-05-016 19-05-020 19-05-024 19-05-025 19-05-028 19-05-029
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="35143"; mail-complaints-to="abuse@iecc.com"
Keywords: standards, errors, C++
Posted-Date: 06 May 2019 10:43:15 EDT

While I join this conversation with trepidation as it is generating
more heat than light, I need to make a few points.


        1) “Built in” Array bounds checking can be done in C. It actually
can be done in assembly language. And, by built in, I mean when the
system enforces it even when the user doesn’t explicitly write it.
(More on this below.)


        2) C++ with vectors supports exactly the kind of for-loops
suggested. In fact, it seems to be the preferred idiom these days.


I know item 1 for a fact, because while I was at DEC circa 1995, one
of my tasks was working on “Third degree”. Third degree was a tool
written by DEC Labs for the Alpha computers to replace “Purify” (aka
“Saber C”) which Rational Software wanted $10 million to port and
$100k/per instance. A similar tool was (is?) available for Intel PCs
called “Bounds Checker”. Valgrind does roughly the same stuff if I
understand it right. And those aren’t the only tools to do so.
Malloc like libraries that do some of the checking are also quite
common. So, don’t blame C/C++ or the relevant compilers that you
aren’t getting your array bounds checked, or that you have dangling
pointers or garbage that isn’t deallocated. It can be done.


That includes if you want slices. I don’t know any tool that does
that explicitly, but it wouldn’t be hard to write.


Anyway, the point I wanted to make is that Third degree didn’t even
need the original source code. Third degree worked from the object
files (or from an executable file). It decompiled them, instrumented
them, and checked for all the common errors. The basic ideas that it
was based upon were all developed in the 1960s (or perhaps before).


It isn’t necessarily the declaration in source code that gives an
object its bounds. An object naturally has bounds. You just have to
understand the underlying semantic model. Again, this was realized in
the 1960s if not before.


If your model says that something handed back by the allocator is an
object and that you cannot reference beyond it. You can create the
requisite fat pointer. You can find all the places in the code, where
the fat pointer is used. You can instrument them with whatever checks
you need. You can even apply optimization checks (and Third degree
did) that eliminate the need for checks that are redundant and cannot
fail (or vice versa, that will always fail).


You can do that for arithmetic too. You don’t need support from the
compiler. You just need to understand the semantics you want.


Now, that doesn’t mean you can solve the Halting problem. Code that
generates code and jumps to it is significantly harder to make work.
But, even that can be solved if you know what semantics you want to
support. Debuggers have long inserted “trap instructions” into code
that replace bits of code to be checked, so that if the code is
executed, it goes and does some other code sequence that has the same
semantics but with additional checks. Think aspect oriented
programming at the machine code level.


You can even get that if your model of generated code is C. You can
generate C that applies the checks you want (and only those you want).
There is nothing in the language that prevents you from doing that.
In fact, the nature of C is to make that possible. That’s why C is
the modern portable assembly language.


Undefined behavior is just there because not every underlying machine
supports exactly the same semantics and if you want a universal
assembler, you have to deal with the fact that the minimal coding
sequence may have different results given the different underlying
semantics. Even if you stay in the x86 family. The best code for an
8086 is not the best code for an amd64 machine and the semantic models
are different despite there being a portable subset that would work on
both.


So, the hard part is defining a balance that one presents to the
users. The balance C struck is that the code will be simple “assembly
like” and you will get something as close to what you would have hand
written in assembler (without checks) as the compiler can give you.
That includes assuming that the rules of arithmetic apply and that
x*4/4 (so macros where the two constant 4s may come from different
places, but you still want them to cancel work) is still x despite
overflows, but if the compiler cannot detect the overflow and it
happens you get whatever the machine gives you.


That’s exactly the code I want. Because, I know that there will be
places in my code where something goes wrong no matter how carefully I
(and my team mates) craft it, we are human and make mistakes. Thus,
we know how to practice defensive programming. Programming that
checks for a pointer that in theory should never be null is actually
not null and those checks still catch things, things like the ECC code
on the chip missing a double-bit error caused by stray radiation or a
transistor that switched too slowly.


The code was correct, but quantum mechanics got in the way. And
that’s reality. Mathematically correct doesn’t catch the fact that
the under carefully controlled conditions of heat and pressure the
system does what it pleases. And, yes, I have fixed bugs like that,
where the source code was correct, but the resulting system didn’t
work for reasons outside the programmers control. But, of course,
many more bugs where the source code simply wasn’t correct or bugs
because the spec changed and the code hadn’t been updated yet.


Now, most people can’t deal with that. They want to imagine that
their world is safe and that they can write simple code that will
“just work”. My stint at Google showed me that point of view in
spades. So, you can turn on -Wall, which many people do, and many
teams require. Then you find, you cannot write the code (at least not
and have it pass by *all* compilers with no warnings/errors).


int32_t sum, increment, max;


sum = 0; max = something_far_less_than_maxint6;


for something {
      increment = some_small_value; // such that sum + increment will not
overflow, not even an int16
      if (max <= increment + sum) break;
      sum += increment; // with -Wall, you get a warning about possible
overflow
      }


The compile time checks are simply not clever enough to realize that
this cannot overflow. You cannot even fix it with:


Int64_t added;


            added = sum + increment;
            if (max <= added) break;
            sum = added; // -Wall still warns here about potential overflow
(loss of data)


There are no casts you can do. No coding tricks, short of applying a
compiler specific pragma to suppress the warning.


Until we have much smarter compilers, this is our fate. Kvetching
about C is not going to fix that. Of fast, easy to write, and always
correct you can pick 2. It’s not the fault of C, that you cannot get
all 3.


--
*****************************************************************************
Chris Clark email:
christopher.f.clark@compiler-resources.com
Compiler Resources, Inc. Web Site: http://world.std.com/~compres
23 Bailey Rd voice: (508) 435-5016
Berlin, MA 01503 USA twitter: @intel_chris
-----------------------------------------------------------------------------


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.