Re: Undefined Behavior Optimizations in C

Kaz Kylheku <864-117-4973@kylheku.com>
Mon, 9 Jan 2023 09:10:24 -0000 (UTC)

          From comp.compilers

Related articles
[3 earlier articles]
Re: Undefined Behavior Optimizations in C anton@mips.complang.tuwien.ac.at (2023-01-06)
Re: Undefined Behavior Optimizations in C anton@mips.complang.tuwien.ac.at (2023-01-06)
Re: Undefined Behavior Optimizations in C david.brown@hesbynett.no (David Brown) (2023-01-06)
Re: Undefined Behavior Optimizations in C gah4@u.washington.edu (gah4) (2023-01-06)
Re: Undefined Behavior Optimizations in C gah4@u.washington.edu (gah4) (2023-01-06)
Re: Undefined Behavior Optimizations in C spibou@gmail.com (Spiros Bousbouras) (2023-01-07)
Re: Undefined Behavior Optimizations in C 864-117-4973@kylheku.com (Kaz Kylheku) (2023-01-09)
Re: Undefined Behavior Optimizations in C 864-117-4973@kylheku.com (Kaz Kylheku) (2023-01-09)
Re: Undefined Behavior Optimizations in C david.brown@hesbynett.no (David Brown) (2023-01-10)
Re: Undefined Behavior Optimizations in C gah4@u.washington.edu (gah4) (2023-01-10)
Re: Undefined Behavior Optimizations in C tkoenig@netcologne.de (Thomas Koenig) (2023-01-11)
Re: Undefined Behavior Optimizations in C david.brown@hesbynett.no (David Brown) (2023-01-11)
Re: Undefined Behavior Optimizations in C david.brown@hesbynett.no (David Brown) (2023-01-11)
[19 later articles]
| List of all articles for this month |

From: Kaz Kylheku <864-117-4973@kylheku.com>
Newsgroups: comp.compilers
Date: Mon, 9 Jan 2023 09:10:24 -0000 (UTC)
Organization: A noiseless patient Spider
References: 23-01-009
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="23342"; mail-complaints-to="abuse@iecc.com"
Keywords: optimize, semantics
Posted-Date: 09 Jan 2023 11:41:15 EST

On 2023-01-05, Lucian Popescu <lucic71@ctrl-c.club> wrote:
> Hi,
>
> I'm currently working on an academic project that analyzes the speedup gain of
> Undefined Behavior Optimizations in C. I argue that UB Optimizations introduce
> more risks than actual speedup gains. This happens because most of the time
> the compiler does not have the context to understand the intention of the
> programmer in code that contains UB. For example this fast inverse square root
> function [1] triggers UB at this line: i = * ( long * ) &y;. A "smart"
> compiler could delete this line as an optimization because it contains UB.


"(Lack of) UB optimizations" are based on these ideas:


1. The programmer is infallible, and therefore
2. Any interpretation of the code's meaning which entails undefined
      behavior must be incorrect and unintended; and also
3. If some construct has no interpretation that is not UB, the
      programmer must be saying that that code is unreachable.


They are pretty terrible engineering.


Assuming a lack of undefined behavior in C is simply foolish; it's
extremely unlikely to be correct for any nontrivial program. A
compiler processing just tens of thousands of lines of C is likely to
be incorrect a few times in making a no-UB assumption.


In regard to 3, there are actually some extensions being proposed to C
whereby the programer can insert an expression which means "this code
is not reached", and the only semantics of that expression is that it
has undefined behavior. Since the programmer would never intend
undefined behavior, by (3) above it implies that the construct is not
reached. The compiler can replace it by an infinite loop, abortive
exit, reformatting of the disk, or whatever. Or just generate no
instructions at all.


I simply cannot get behind the idea of a deliberately introduced
language construct that has nothing but undefined behavior. I
understand that it's efficient: the compiler can reason about that not
being reachable, and not expend any instructions for it that would add
to the code size. I would rather have spend the code size to have it
stop. Just stick a breakpoint trap instruction in there or whatever.


> Don't get me wrong, I'm not saying that the compiler should magically
> understand situations where the programmer makes UB mistakes and then
> repair them. What I'm saying is that the compiler should compile the code
> "as is" without making smart optimization transformations that break the
> intention of the programmer.


I echo that serntiment. I don't require optimizations of the form
"assuming expression X is always well-defined, thus can translate some
other expression Y as follows".


In C, the programmer is supposed to do a lot of the optimizing.


In spite of these UB-related shennanigans, new developments in compilers
like GCC are not improving this basic fact.


You still have to know the tight coding techniques from 1990 in order
to get good code out of the latest GCC.


In the 1990's, I compared code for deleting from a linked list:


      node->next->prev = node->prev;
      node->prev->next = node->next;


versus:


      node_t *next = node->next;
      node_t *prev = node->prev;


      next->prev = prev;
      prev->next = next;


I get the same results today as I did in 1990-something: a shorter
instruction sequence for the code in which I cached the pointers in
local variables! This is because the compiler suspects that
the first assignment "node->next->prev = node->prev"
might or might not have affected the value of "node->prev";
so that has to be loaded again; it cannot be subject to CSE. Strict
aliasing reasoning doesn't help because everything has the same node_t *
type.


(But, note! If the assignment node->next->prev = node->prev
affects node->prev, that could only be becauase node->next == node.
And in that case, the assignment is a no-op: the cached node->prev
value could be used to avoid accesing that location again! GCC is not
yet smart enough to figure this out: and nobody needs it to be.)


A C compiler just needs to obey the memory accesses the programmer
expressed, figure out which local variables to keep in registers,
and do some basic things with loops, inlining and whatnot; and
have good basic optimizations like control flow stuff (jump thraeding
and whatnot), peephole optimizations, CSE, ...


Approaches like "oh this loop must be infinite because it nothing but
increments the dummy variable and tests for it going negative" just have
no place in engineering; that's just someone dicking around to get their
name in the git history of the compiler, and stuff their resume.


If you want an optimal instruction sequence for something, there is
always assembly language.


--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.