Re: Optimization techniques and undefined behavior

David Brown <david.brown@hesbynett.no>
Mon, 6 May 2019 08:14:51 +0200

From comp.compilers

Related articles
[23 earlier articles]
Re: Optimization techniques and undefined behavior david.brown@hesbynett.no (David Brown) (2019-05-03)
Re: Optimization techniques and undefined behavior bc@freeuk.com (Bart) (2019-05-03)
Re: Optimization techniques and undefined behavior bc@freeuk.com (Bart) (2019-05-03)
Re: Optimization techniques and undefined behavior anw@cuboid.co.uk (Andy Walker) (2019-05-04)
Re: Optimization techniques and undefined behavior gneuner2@comcast.net (George Neuner) (2019-05-04)
Re: Optimization techniques and undefined behavior gneuner2@comcast.net (George Neuner) (2019-05-04)
*Re: Optimization techniques and undefined behavior david.brown@hesbynett.no (David Brown)* (2019-05-06)**
Re: Optimization techniques and undefined behavior martin@gkc.org.uk (Martin Ward) (2019-05-06)
Re: Optimization techniques and undefined behavior robin51@dodo.com.au (Robin Vowels) (2019-05-07)
Re: Optimization techniques and undefined behavior derek@_NOSPAM_knosof.co.uk (Derek M. Jones) (2019-05-06)
Re: Optimization techniques and undefined behavior david.brown@hesbynett.no (David Brown) (2019-05-07)
Re: Optimization techniques and undefined behavior david.brown@hesbynett.no (David Brown) (2019-05-07)
Re: Optimization techniques and undefined behavior david.brown@hesbynett.no (David Brown) (2019-05-07)
[4 later articles]

| List of all articles for this month |

From:	David Brown <david.brown@hesbynett.no>
Newsgroups:	comp.compilers
Date:	Mon, 6 May 2019 08:14:51 +0200
Organization:	A noiseless patient Spider
References:	19-04-021 19-04-023 19-04-037 19-04-039 19-04-042 19-04-044 19-04-047 19-05-004 19-05-006 19-05-016 19-05-017
Injection-Info:	gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="33030"; mail-complaints-to="abuse@iecc.com"
Keywords:	standards, debug
Posted-Date:	06 May 2019 10:36:49 EDT
Content-Language:	en-GB

On 03/05/2019 11:52, Martin Ward wrote:
> On 03/05/19 00:48, Bart wrote:
>> And I think that if a program can
>> go seriously wrong through unchecked input, then that's a failure in
>> proper validation. It's rather sloppy to rely on a runtime check put
>> their by a compiler.
>
> The car analogy for C is that C is a car with no seatbelts, crumple
> zones, roll bars, airbags etc. The car manual explicitly states that
> nudging the kerb with any tyre is "undefined behaviour" and could
> cause the car to explode in a fireball, killing all the passengers.

That is not /quite/ right. The C standards are like laws saying that a
car has to have an engine and a number of wheels (but that is up to the
manufacturer to decide how many, as long as they tell you). The
regulations don't require seatbelts, but you can always make your own -
and a manufacturer is free to add any extras they like as long as they
don't hinder the basic function of the car. The manufacturer can also
change details of how the car works depending on who is driving it,
trimming the engine or adding ABS brakes according to preference.

>
> On 2019-05-01, David Brown <david.brown@hesbynett.no> wrote:
>> Detecting signed overflow at run-time can be a significant cost.
>
> Firstly: the cost is not as high as the cost of security breaches due
> to buffer overflows.

Apples and oranges.

The cost of adding run-time costs can often be significant, and in
extreme cases it simply means the code can't be written in that language
- C is an appropriate choice of programming language precisely when you
want maximal efficiency. And this cost applies even when the programmer
knows fine that the operations don't overflow.

The consequences of bugs can be severe - of that there is no doubt. But
run-time overflow checking does not stop security bugs - it is unlikely
to make a significant dent in them.

It is clearly absurd to take an example of a security bug (like the
libpng one), blame it all on a minor part (the fault was failing to
check unknown data, nothing to do with overflows), and then take that as
"proof" that run-time overflow checking will solve security problems.

Security problems are caused by /bugs/. That can be bugs in the
specification of the program, bugs in the design, bugs in the
management, just as much as bugs in the coding. Bugs in the coding can
come from all sorts of sources. Overflows (whether they are arithmetic
overflows, or buffer overflows) are just one of many types of potential
bugs.

Now, as I have said before, it makes a lot of sense to enable run-time
checking while testing and debugging, to find and eliminate as many bugs
as possible. That does not mean you want the checks in after you have
tested the code - if your testing is good, your development process is
good, and your code review process is good, then it is highly unlikely
that there are such overflows left in the code. Having run-time checks
means bigger and slower code, and lots of code paths that never get
tested - which is a really bad idea.

And often it would simply be better to use a different language, rather
than C.

> Secondly: if many popular languages specified
> suitable handling for signed overflow, buffer overruns and so on, then
> CPUs hardware would be developed which makes these tests efficient:
> because compiled code in these popular languages would run faster on
> such CPUs.

Signed overflow and buffer overflow have little in common, expect
coincidental names. You need to consider them separately, unless you
are just talking about bugs in general.

The inefficiency of signed overflow detection is not a matter of cpu
instructions. Quite a few cpus have operations with "trap on overflow"
behaviour, or at least a "trap if overflow flag set" operation. The
major cost, in many cases, is in how it hinders re-arrangements of
expressions, common expression elimination, simplifications, etc. A
second case is that it severely limits the use SIMD or vector operations.

Another point is that it rarely matches reality. What is so special
about 2147483647 that you want to make sure "a + b" does not exceed it?
It is more realistic that the limit is something entirely different.
Maybe you are talking about percentages, and the limit is 100. When you
have realistic requirements based on the code and the problem, rather
than arbitrary limits based on some implementation type, you can quickly
see that this would be hard to handle in cpu instructions.

For buffer overflows, it can be both easier and harder. It is possible
to have hardware that attaches a size to a pointer, and does some
checking. It would be quite simple to check addressing modes of "base
register + index" style, but a lot harder when addresses are
pre-computed in other registers. You can do it if you ensure that the
logical addresses of data objects are in distinct areas, but that puts a
lot of overhead on the OS and memory allocators, and accesses involve a
great deal more virtual page lookups. It is used by tools such as
valgrind or address sanitizers - tools you want during development and
testing to find the bugs, but not to have on released code.

There have been many attempts through the ages to add special features
in processors to aid security, run-time error checking, supporting
checked languages (like Java), etc. Most of them have been failures.
You are not the first person to think of this.

>
>> I was talking about a /dimension/ of 2 billion - that is, a width or
>> height of 2 billion.
>
> If you are reading from an unknown file (eg an image on a web page)
> then it would be foolish to assume that no dimension is bigger that 2
> billion: security breaches due to carefully constructed image files
> have occurred in the past.

You missed my point completely. When I say that the image file should
not have a dimension of such ridiculous sizes, I mean you should check
the dimensions given by the file, and reject the file as broken if its
dimensions exceed such sizes. I repeatedly said you must check the
unknown data for sanity.

> Also, the netpbm library can be used for
> files containing data which is *not* image data: for example, as
> generic utilities for processing huge bit strings. These bit strings
> might well contain more than 2 billion bits (250 MB of data).
>

It does not matter - the principle is the same. Check the unknown data
for sanity. The sizes of the data will not be so big that you will be
overflowing properly chosen types (in C or whatever language you want).
You can do these checks safely and without overflows.

> Back in the early days of Unix there were many utilities for
> processing text files. It was discovered that many of these would
> crash or hang when fed random binary data:
>
> https://www.fuzzingbook.org/html/Fuzzer.html
> ftp://ftp.cs.wisc.edu/paradyn/technical_papers/fuzz-revisited.ps
>
> This is a problem because (1) a text utility can be used as a
> general-purpose data manupulation program which is fed binary data (2)
> more importantly: each crash is a potential security hole.
>

People used to be a lot more trusting of unknown data. It is not a good
idea, and fortunately many programmers have learnt. Unfortunately, not
/all/ programmers have learned. (And also those who have learned, are
still fallible humans.)

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Optimization techniques and undefined behavior

David Brown <david.brown@hesbynett.no>Mon, 6 May 2019 08:14:51 +0200

David Brown <david.brown@hesbynett.no>
Mon, 6 May 2019 08:14:51 +0200