Re: Undefined Behavior Optimizations in C (Anton Ertl)
Fri, 06 Jan 2023 08:41:49 GMT

          From comp.compilers

Related articles
Undefined Behavior Optimizations in C (Lucian Popescu) (2023-01-05)
Re: Undefined Behavior Optimizations in C (Spiros Bousbouras) (2023-01-05)
Re: Undefined Behavior Optimizations in C (gah4) (2023-01-05)
Re: Undefined Behavior Optimizations in C (2023-01-06)
Re: Undefined Behavior Optimizations in C (2023-01-06)
Re: Undefined Behavior Optimizations in C (David Brown) (2023-01-06)
Re: Undefined Behavior Optimizations in C (gah4) (2023-01-06)
Re: Undefined Behavior Optimizations in C (gah4) (2023-01-06)
Re: Undefined Behavior Optimizations in C (Spiros Bousbouras) (2023-01-07)
Re: Undefined Behavior Optimizations in C (Kaz Kylheku) (2023-01-09)
Re: Undefined Behavior Optimizations in C (Kaz Kylheku) (2023-01-09)
[24 later articles]
| List of all articles for this month |

From: (Anton Ertl)
Newsgroups: comp.compilers
Date: Fri, 06 Jan 2023 08:41:49 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
References: 23-01-009 23-01-011 23-01-012
Injection-Info:; posting-host=""; logging-data="29736"; mail-complaints-to=""
Keywords: optimize, semantics
Posted-Date: 06 Jan 2023 12:00:03 EST

gah4 <> writes:
>On Thursday, January 5, 2023 at 10:13:08 AM UTC-8, Spiros Bousbouras wrote:
>> On 5 Jan 2023 10:05:49 +0000
>> "Lucian Popescu" <> wrote:
>> > I'm currently working on an academic project that analyzes the speedup gain of
>> > Undefined Behavior Optimizations in C.
>> > To test the theory that the UB Optimizations introduce more risks than
>> > speedup gains,
>> Isn't this comparing apples and oranges ?

You can compare apples and oranges: How many can you buy for, say, EUR
5? What's their nutritional value? etc.

In a similar way, we can compare the advantages and costs of "UB

The advantages claimed by the faithful are speedups (usually
unquantified, to let the the audience imagine high speedups). Wang et
al. has done some quantification work, as have I; the results were
that the speedups were small, or sometimes even slowdowns.

Among the costs are that programmers have to change your program to
eliminate standard-undefined behaviour.

First you have to find that behaviour. The "UB Optimization"
compilers have added "sanitizers" to let you find it; however,
sanitizers work at run-time, so they won't find cases where, e.g., the
"UB Optimization" "optimized" away a bounds-check preventing a buffer
overflow unless there is a test case that tries to perform this buffer

And once you have found the behaviour, you have to change it. This
not only costs developers time, it can also cost performance. E.g.,
in Section 4.3 of
I estimate that converting Gforth to C without undefined behaviour
would cause a slowdown by a factor >3; Gforth is currently compiled
with as few "UB Optimizations" as possible (by using flags like
-fwrapv and -fno-... flags). I expect that removing these flags will
give a speedup by approximately a factor of 1. For those who believe
differently, I have repeatedly posted a challenge

|Write a proof-of-concept Forth interpreter in the language you
|advocate that runs at least one of bubble-sort, matrix-mult or sieve
|from bench/forth in

We can then check whether the language you advocate can compete with a
low-level language by benchmarking your interpreter against Gforth.

We can compare the advantages to the costs of "UB Optimization" in the
currency of development time:

How much development time does it cost to achieve as much speedup as
the UB optimizations give?

How much development time does it cost to add test cases for
sanitizing, to run all the different sanitizers, to remove the
undefined behaviour that you find, and to add manual optimizations
that compensate any performance regressions that may turn up in that

My expectation is that the latter development cost would often be far
greater (and, in the case of the Forth interpreter, I expect that the
performance regression compensation is not possible), but it would be
nice if we had some empirical results for that.

>Most important when debugging, is that you can trust the compiler to
>do what you said. That they don't, has always been part of

If a compiler produces something that behaves differently, it's not an
optimization. One usually considers the I/O (but not timing)
behaviour of the program when evaluating behaviour. When using a
single-step debugger, you see execution at a more fine-grained level
than the optimization was designed for. One approach is to debug the
unoptimized program, and enable optimization afterwards. If it's a
proper optimization, the I/O behaviour will be unchanged.

>but these UB make it worse.

"UB optimizations" assume that programs don't perform undefined
behaviour, and they are transformations that are proper optimizations
for such programs. The problem is that this assumption is usually

>My thought would be all of SPEC

SPEC CPU programs are not representative in many respects:

First, they are chosen to be standard-compliant; it's not clear how
SPEC tries to ensure this, but as the 464.h264ref function SATD shows,
the process is not perfect. Still, a program that uses one of the
flags for disabling "UB optimization" is going to be rejected by SPEC,
so SPEC programs cannot be used for evaluating all the costs of "UB

Moreover, the "UB optimization" compilers are tuned for SPEC CPU: They
compile the undefined behaviour in SPEC CPU programs (e.g., SATD) as
intended, while they miscompile a non-SPEC CPU program with the same
undefined behaviour (see gcc bug 66875).

And given that the compiler maintainers (and others) evaluate the
performance of the compilers using SPEC CPU, the compiler maintainers
are motivated to add optimizations that speed up a SPEC CPU program;
e.g., if an "UB optimization idea" breaks a test case one one of the
tested programs (for gcc those reportedly nowadays include all
automatically testable Debian programs, which is good), but benefits a
SPEC CPU program, the compiler maintainer will be motivated to refine
the criteria for applying the optimizations until the breakage
disappears while still applying for the SPEC CPU program; if the "UB
optimization" benefits only programs outside the relevant benchmarks,
the compiler maintainer will be less likely to spend as much time on
it. As a result, I expect to see more benefits from "UB
Optimizations" for SPEC CPU than for, say, code that was written after
the compiler was released (and was not tuned for the compiler).

>I haven't thought about this so recently. Do compilers give warnings
>when they remove such UB code?

No. The faithful claim that it is impossible, but I don't think so:

The compiler could internally compile the program twice: Once with the
flags as given, and once with all the flags that the compiler has for
defining undefined behaviour (-fwrapv and the various
-fno-... options). If there is a difference in the resulting code,
produce a warning. That warning could be for something that is a
proper optimization for this particular program (i.e., a false
positive), or it could be for miscompilation (a true positive); that's
up to the developer to sort out.

Once we have that framework in place, we can look at the false
positives and refine it. E.g., there is a claim that many false
negatives come from macro evaluations that are possible at compile
time. In that case, one could add something to the source code that
suppresses the warning (just like there is "__attribute__((unused))"
for suppressing warnings about unused variables.

- anton
M. Anton Ertl

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.