Re: Undefined Behavior Optimizations in C

David Brown <david.brown@hesbynett.no>
Tue, 10 Jan 2023 17:32:28 +0100

From comp.compilers

Related articles
[5 earlier articles]
Re: Undefined Behavior Optimizations in C david.brown@hesbynett.no (David Brown) (2023-01-06)
Re: Undefined Behavior Optimizations in C gah4@u.washington.edu (gah4) (2023-01-06)
Re: Undefined Behavior Optimizations in C gah4@u.washington.edu (gah4) (2023-01-06)
Re: Undefined Behavior Optimizations in C spibou@gmail.com (Spiros Bousbouras) (2023-01-07)
Re: Undefined Behavior Optimizations in C 864-117-4973@kylheku.com (Kaz Kylheku) (2023-01-09)
Re: Undefined Behavior Optimizations in C 864-117-4973@kylheku.com (Kaz Kylheku) (2023-01-09)
*Re: Undefined Behavior Optimizations in C david.brown@hesbynett.no (David Brown)* (2023-01-10)**
Re: Undefined Behavior Optimizations in C gah4@u.washington.edu (gah4) (2023-01-10)
Re: Undefined Behavior Optimizations in C tkoenig@netcologne.de (Thomas Koenig) (2023-01-11)
Re: Undefined Behavior Optimizations in C david.brown@hesbynett.no (David Brown) (2023-01-11)
Re: Undefined Behavior Optimizations in C david.brown@hesbynett.no (David Brown) (2023-01-11)
Re: Undefined Behavior Optimizations in C gah4@u.washington.edu (gah4) (2023-01-11)
Re: Undefined Behavior Optimizations in C 864-117-4973@kylheku.com (Kaz Kylheku) (2023-01-12)
[17 later articles]

| List of all articles for this month |

From:	David Brown <david.brown@hesbynett.no>
Newsgroups:	comp.compilers
Date:	Tue, 10 Jan 2023 17:32:28 +0100
Organization:	A noiseless patient Spider
References:	23-01-009 23-01-011 23-01-012 23-01-017 23-01-027
Injection-Info:	gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="79420"; mail-complaints-to="abuse@iecc.com"
Keywords:	C, optimize
Posted-Date:	10 Jan 2023 17:00:54 EST

On 09/01/2023 11:14, Kaz Kylheku wrote:
> On 2023-01-06, David Brown <david.brown@hesbynett.no> wrote:
>> On 06/01/2023 01:22, gah4 wrote:

>>> Most important when debugging, is that you can trust the compiler to
>>> do what you said. That they don't, has always been part of
>>> optimization, but these UB make it worse.
>>
>> The trouble with undefined behaviour is that, in general, you cannot
>> trust the compiler to "do what you say" because it cannot know what you
>> have said.
>
> That's correcst; however, what we should be able to trust is for the
> compiler not to make it worse.
>

Let's be clear here. At the theoretical level, when your code executes
undefined behaviour at run time, you have said "I don't care what
happens now". The compiler is allowed to do /anything/. There is no
such thing as "making it worse" - you have said you will accept
absolutely /anything/ from the compiler. No restrictions, no
limitations - /anything/.

That is what "undefined behaviour" means. There are /no/ definitions of
the behaviour, and no limits, expectations, "worse" choices, "better"
choices.

At a /practical/ level, compilers can try to be more helpful. They can
define behaviour for things that are not defined in the language
specifications. They can give you warnings or error messages (at
compile time or run time, such as when using sanitizers). They can
offer optional extra semantics (like gcc's "-fwrapv" flag). They can
certainly avoid going out of their way to avoid being specially awkward
- normally they are simply trying to make the code more efficient when
it is running correctly.

But you cannot be sure things won't go unexpectedly bad. You can never
have guarantees about that. Guarantees require definitions, and you
don't have that with undefined behaviour.

Consider this example source code:

void print_message(bool x) {
if (x) {
printf("Great!\n");
} else {
printf("Boo!\n");
}
}

void format_harddisk(int disk_number) {
...
}

This is fine code, with no UB in sight.

The compiler could this to the pseudo-code :

print_message:
if ($GPR1 == 0) :
$GPR1 = "Boo!"
jump puts

if ($GPR1 == 1) :
$GPR1 = "Great!"
jump puts

format_harddisk:
...

It knows that on entry, the parameter in GPR1 is either 0 or 1, because
it is type "bool".

What happens if you call the function from a different translation unit
like this :

extern void print_message(int x);
print_message(2);

?

This is clearly wrong - clearly undefined behaviour. And the result
would be a formatted disk. But there is nothing wrong with the
compiler's generated code. I have seen other occasions when compiler's
have made code with booleans that appear to be both true and false, or
neither true nor false, as a result of undefined behaviour setting the
underlying memory to something other than 0 or 1, simply because that
was the result of the most efficient code.

There are all sorts of other ways a compiler can take advantage of
knowing a boolean is either 0 or 1, and all sorts of things that can go
wrong if you've messed up your code and the boolean is /not/ 0 or 1. It
can use it as an index into an array, or to multiply a value
(translating "x = f ? a : b;" into "x = f * (a - b) + b;" ), or using it
as a bit mask, or subtracting 1 and using /that/ as a mask. And once
you have started getting the wrong values, there is no theoretical limit
to how bad things can get - it is not the compiler "making things
worse", it is bugs in the code being outside the specifications for what
is intended for the program.

Of course the compiler should not be messing around checking that the
value you pass is actually 0 or 1. That would be inefficient. And what
would it do if it was a bad value? Treat it as "false" ? What about
calling "missile_control(bool cancel_launch)" with a bad value? Should
it treat it as "true" ? What about calling "missile_control(bool
confirm_launch)" with a bad value? Jump to an error handler and stop
the program with a message? How does that work for your flight controller?

If you want a hand-holding check everything language, there are plenty
of them to go around. They're great, and far more appropriate than C
for many purposes. C, on the other hand, trusts the programmer and
gives you the most efficient object code it can from the source you give
it, based on the rules of the language (plus any extra features defined
by the compiler).

> The compiler makes it worse when it assumes that the programmer is
> infallible, and thus makes logical inferences predicated on some
> construct never having undefined behavior, using those to guide the
> translation of other constructs.

It is the programmers' /job/ to write correct code. It is up to them to
use any and all tools available to do that - static error checking,
run-time sanitizers, code reviews, unit tests, system tests, etc. That
also includes using the write language for the task in hand. If the
code is critical, but too complex to be written bug-free in C, then
maybe a different language would be better - or maybe a different
programmer.

I am not suggesting that I always write bug-free code. But the bugs I
put in my code are /my/ bugs - /my/ responsibility. And I don't blame
the compiler for the effects of these bugs.

>
> This is particularly harmful when the undefined behavior is *de facto*
> defined: like that if the undefinedness aspect is ignored, and the
> obvious code is emitted, that code will do something characteristic
> of the machine.

It is particularly harmful when programmers think there is such a thing
as "de facto defined". That's an oxymoron. If the behaviour is
defined, it is defined. If it is not defined, it is not defined. If it
is not defined and a programmer makes unwarranted and incorrect
assumptions about what they think it means, then the programmer needs to
update his or her understanding of the language. They don't get to
blame the compiler or the compiler writer for not making the same
unfounded assumptions that they did.

>
> In C99, the original definition of undefined behavior is this:
>
> behavior, upon use of a nonportable or erroneous program construct or
> of erroneous data, for which this International Standard imposes no
> requirements.
>
> NOTE: Possible undefined behavior ranges from ignoring the situation
> completely with unpredictable results, to behaving during translation
> or program execution in a documented manner characteristic of the
> environment (with or without the issuance of a diagnostic message), to
> terminating a translation or execution (with the issuance of a
> diagnostic message).
>
> There is an obvious intepretation of this text which rules out
> the too-clever optimizations.
>

I presume you mean "a documented manner characteristic of the
environment". That would be something like gcc's "-fwrapv" flag. It
takes something that is undefined in the C standards - signed integer
overflow - and gives it a /documented/ behaviour. The key here is
"documented". Now the behaviour is /defined/ - it is UB as far as the C
standards are concerned, but /defined/ behaviour as far as the
implementation is concerned. And you can rely on it, as long as you use
a compiler with such a flag and documented behaviour.

The more general and obvious interpretation of the definition is
"imposes no requirements" - you have no right to have any expectations
about the results of the undefined behaviour. In particular, while you
might have some guesses as to what will happen just at the time, you
have no idea about knock-on effects.

> If the compiler optimizes construct Y based on some other construct
> X being free of undefined behavior, and that assumption turns
> out to be false, that compiler has not conformed to the "NOTE"
> part of the treatment of undefined behavior.
>
> Firstly, it has not "ignor[ed] the situation completely". You cannot
> assume that X is well-defined, in order to treat Y in some way, and yet
> say that you're completely ignoring the situaton of X. If the UB
> problem in X causes a secondary problem with the way Y was translated,
> that is a problem with the assumption that was made about X. That
> assumption was made because it's possible for X to be undefined, and
> that possiblity was expliclty disregarded, which is not the same as
> completely ignored.
>
> The situation also isn't a documented manner characteristic of
> the environment.
>
> It also isn't a termination of translation or execution with or without
> a diagnostic message.
>
> The situation doesn't conform to the NOTE.
>
> C compiler writers should take the NOTE more seriously.

The "note" is just a note, not part of the language definitions, and it
gives some example treatments of undefined behaviour, not an exclusive
list of options. No, compiler writers do /not/ have to take it
seriously. If the C standards committee had wanted to impose
restrictions on what can happen during undefined behaviour, they would
have given them instead of saying "no requirements". (And the committee
are smart enough to know that imposing restrictions to UB in general is
not possible.)

Of course it would be possible to impose restrictions in particular
cases. It would be fine to say, for example, that signed integer
overflow gives an unspecified integer value as a result, rather than
undefined behaviour. Or that attempting to dereference an invalid
pointer will result in the memory being written or read regardless. But
that would be giving definitions to behaviour that is currently
undefined - you cannot impose meaning on undefined behaviour itself.

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Undefined Behavior Optimizations in C

David Brown <david.brown@hesbynett.no>Tue, 10 Jan 2023 17:32:28 +0100

David Brown <david.brown@hesbynett.no>
Tue, 10 Jan 2023 17:32:28 +0100