Re: Undefined Behavior Optimizations in C

anton@mips.complang.tuwien.ac.at (Anton Ertl)Fri, 06 Jan 2023 07:47:55 GMT

From comp.compilers

Related articles
Undefined Behavior Optimizations in C lucic71@ctrl-c.club (Lucian Popescu) (2023-01-05)
Re: Undefined Behavior Optimizations in C spibou@gmail.com (Spiros Bousbouras) (2023-01-05)
Re: Undefined Behavior Optimizations in C gah4@u.washington.edu (gah4) (2023-01-05)
Re: Undefined Behavior Optimizations in C anton@mips.complang.tuwien.ac.at (2023-01-06)
Re: Undefined Behavior Optimizations in C anton@mips.complang.tuwien.ac.at (2023-01-06)
Re: Undefined Behavior Optimizations in C david.brown@hesbynett.no (David Brown) (2023-01-06)
Re: Undefined Behavior Optimizations in C gah4@u.washington.edu (gah4) (2023-01-06)
Re: Undefined Behavior Optimizations in C gah4@u.washington.edu (gah4) (2023-01-06)
Re: Undefined Behavior Optimizations in C spibou@gmail.com (Spiros Bousbouras) (2023-01-07)
Re: Undefined Behavior Optimizations in C 864-117-4973@kylheku.com (Kaz Kylheku) (2023-01-09)
[25 later articles]
| List of all articles for this month |

 From: anton@mips.complang.tuwien.ac.at (Anton Ertl) Newsgroups: comp.compilers Date: Fri, 06 Jan 2023 07:47:55 GMT Organization: Institut fuer Computersprachen, Technische Universitaet Wien References: 23-01-009 Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="29210"; mail-complaints-to="abuse@iecc.com" Keywords: semantics, optimize Posted-Date: 06 Jan 2023 11:57:45 EST

"Lucian Popescu" <lucic71@ctrl-c.club> writes:
>I'm currently working on an academic project that analyzes the speedup gain of
>Undefined Behavior Optimizations in C.

Some earlier work:

@InProceedings{wang+12,
author = {Xi Wang and Haogang Chen and Alvin Cheung and Zhihao Jia and Nickolai Zeldovich and M. Frans Kaashoek},
title = {Undefined Behavior: What Happened to My Code?},
booktitle = {Asia-Pacific Workshop on Systems (APSYS'12)},
OPTpages = {},
year = {2012},
url1 = {http://homes.cs.washington.edu/~akcheung/getFile.php?file=apsys12.pdf},
url2 = {http://people.csail.mit.edu/nickolai/papers/wang-undef-2012-08-21.pdf},
OPTannote = {}
}

The most interesting part wrt. topic is section 3.3: They compiled
SPECint 2006 with gcc and clang, with either the default options, or
with defining the behaviour with the options -fno-strict-overflow (or
-fwrapv) -fno-delete-null-pointer-checks -fno-strict-aliasing, which
disables most of what you call "UB Optimizations" in the compiler
versions they used (and probably still disables most "UB
Optimizations"; there have been new -fno-* flags added, corresponding
to some of the new "UB Optimizations", but I expect that the early "UB
low-impact "optimizations").

The results were that in 10 of 12 programs, there was no significant
performance difference. In 2 of the 12 programs, there were speed
differences by factors 1.072, 1.09, 1.063 and 1.118 (depending on
program and compiler used).

However, they also found that these speed differences are due to a
single place in the source code of each of the two programs, and that
a small change to the source code of each program could eliminate the
speed difference.

@InProceedings{ertl15kps,
author = {M. Anton Ertl},
title = {What every compiler writer should know about programmers},
crossref = {kps15},
pages = {112--133},
url = {http://www.complang.tuwien.ac.at/kps2015/proceedings/KPS_2015_submission_29.pdf},
abstract = {In recent years C compiler writers have taken the
attitude that tested and working production (i.e.,
\emph{conforming} according to the C standard) C
programs are buggy if they contain undefined
behavior, and they feel free to compile these
programs (except benchmarks) in a way that they no
longer work. The justification for this attitude is
that it allows C compilers to optimize better. But
while these optimizations'' provide a factor 1.017
speedup with Clang-3.1 on SPECint 2006, for
non-benchmarks it also has a cost: if we go by the
wishes of the compiler maintainers, we have to
fix'' our working, conforming C programs; this
requires a substantial effort, and can result in
bigger and slower code. The effort could otherwise
be invested in source-level optimizations by
programmers, which can have a much bigger effect
(e.g., a factor $>2.7$ for Jon Bentley's traveling
salesman program). Therefore, optimizations that
assume that undefined behavior does not exist are a
bad idea not just for security, but also for the
performance of non-benchmark programs.}
}

@Proceedings{kps15,
title = {18. Kolloquium Programmiersprachen und Grundlagen der Programmierung (KPS'15)},
booktitle = {18. Kolloquium Programmiersprachen und Grundlagen der Programmierung (KPS'15)},
year = {2015},
key = {KPS '15},
editor = {Jens Knoop and M. Anton Ertl},
url = {http://www.complang.tuwien.ac.at/kps2015/proceedings/},
url-pdf = {http://www.complang.tuwien.ac.at/kps2015/proceedings/kps15.pdf}
}

In section 4.2 I use the same way to determine the effectiveness of
"UB Optimizations" (called "optimizations" (with quotes) in the
paper), but with a longer list of flags for gcc-5.2.0:

-fno-aggressive-loop-optimizations -fno-unsafe-loop-optimizations
-fno-delete-null-pointer-checks -fno-strict-aliasing
-fno-strict-overflow -fno-isolate-erroneous-paths-dereference
-fwrapv

I used 8 different versions of Jon Bentley's Traveling Salesman
program translated to C (from Pascal). I found that for clang-3.5 the
flags made no difference (same binary) for any of the programs, while
for gcc-5.2.0 they made no difference in the binary for three of the
programs and no significant speed difference for two more programs.
For the remaining programs, the "speedups" from "UB Optimizations"
were by a factor of 1.04, 0.95, and 1.02.

It contrasts this with the speedup seen from source-level (manual)
optimizations: The 8 programs were successive steps in manual
optimization. The speedup seen here was a factor 2.7 when using
gcc-5.2.0 -O3 and more for other compilers.

>I argue that UB Optimizations introduce
>more risks than actual speedup gains.

It's hard to quantify the risks.

My argument has been that it is cheaper and much more effective to
perform manual optimization than to ensure that a C program contains
no undefined behaviour.

>Don't get me wrong, I'm not saying that the compiler should magically
>understand situations where the programmer makes UB mistakes and then
>repair them. What I'm saying is that the compiler should compile the code
>"as is" without making smart optimization transformations that break the
>intention of the programmer.

To which the advocates of "UB Optimizations" react by claiming that
they don't know the intention. So I have written a followup paper
that discusses this aspect in more depth:

@InProceedings{ertl17kps,
author = {M. Anton Ertl},
title = {The Intended Meaning of \emph{Undefined Behaviour}
in {C} Programs},
crossref = {kps17},
pages = {20--28},
url = {http://www.complang.tuwien.ac.at/papers/ertl17kps.pdf},
url2 = {http://www.kps2017.uni-jena.de/proceedings/kps2017_submission_5.pdf},
abstract = {All significant C programs contain undefined
behaviour. There are conflicting positions about how
to deal with that: One position is that all these
programs are broken and may be compiled to arbitrary
code. Another position is that tested and working
programs should continue to work as intended by the
programmer with future versions of the same C
compiler. In that context the advocates of the first
position often claim that they do not know the
intended meaning of a program with undefined
behaviour. This paper explores this topic in greater
depth. The goal is to preserve the behaviour of
existing, tested programs. It is achieved by letting
the compiler define a consistent mapping of C
operations to machine code; and the compiler then
has to stick to this behaviour during optimizations
and in future releases.}
}

@Proceedings{kps17,
title = {19. Kolloquium Programmiersprachen und Grundlagen der Programmierung (KPS'17)},
booktitle = {19. Kolloquium Programmiersprachen und Grundlagen der Programmierung (KPS'17)},
year = {2017},
key = {KPS '17},
editor = {Wolfram Amme and Thomas Heinze},
url = {http://www.kps2017.uni-jena.de/kps2017_proceedings.html},
url-pdf = {http://www.kps2017.uni-jena.de/proceedings/kps2017.pdf}
}

>My current progress is here [2]. I did not start the technical work, ATM I only
>have the research proposal. I reached you to see if you have any feedback
>on this proposal.
[[2] https://tildegit.org/lucic71/dissertation/src/branch/master/TSW/tsw.pdf]

I have read Section III and IV of this paper.

Section III: You cite my 2015 paper above. About SPECInt 2006, I
cited only Wang et al.'s paper (and compute the overall SPECInt
difference based on their numbers). For the traveling salesman
program, the speedup factor of 2.7 (or more, depending on compiler and
optimization level) is for source-level (i.e., manual) optimization,
not "UB Optimization". The numbers for "UB Optimization" are, as
mentioned above, usually a factor of 1, and in the three cases where
there is a difference, it's a factor 1.04, 0.95, and 1.02, as
mentioned above.

Section IV: You have the beginnings of a plan to evaluate the
performance; you leave the question of benchmarks open until you know
what changes in the resulting code are due to "UB Optimization". It's
not clear what the point of such a benchmark selection would be. It
certainly would not make the selection more representative, and would
have the smell of cherry-picking. Also, "UB Optimization" may cause
changes in thousands of places in the code of OpenBSD, but only few of
those places will be in hot code and be relevant for performance.

What I miss is a plan to quantify the risks. Wang et al. made a
pretty good case (as far as I am concerned, but the "UB Optimization"
faithful remain unconvinced) in one of their papers when they analysed
all the Debian packages.

One question is: Given Wang et al.'s earlier work, what is the
original contribution of your intended research?

I discuss some alternative research topics in Section 5.4 of my 2015
paper.

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/

Post a followup to this message