Re: Undefined Behavior Optimizations in C

David Brown <david.brown@hesbynett.no>
Wed, 18 Jan 2023 21:14:44 +0100

From comp.compilers

Related articles
[18 earlier articles]
Re: Undefined Behavior Optimizations in C 864-117-4973@kylheku.com (Kaz Kylheku) (2023-01-12)
Re: Undefined Behavior Optimizations in C Keith.S.Thompson+u@gmail.com (Keith Thompson) (2023-01-12)
Re: Undefined Behavior Optimizations in C tkoenig@netcologne.de (Thomas Koenig) (2023-01-12)
Re: Undefined Behavior Optimizations in C antispam@math.uni.wroc.pl (2023-01-13)
Re: Undefined Behavior Optimizations in C 864-117-4973@kylheku.com (Kaz Kylheku) (2023-01-15)
Re: Undefined Behavior Optimizations in C spibou@gmail.com (Spiros Bousbouras) (2023-01-18)
*Re: Undefined Behavior Optimizations in C david.brown@hesbynett.no (David Brown)* (2023-01-18)**
Re: Undefined Behavior Optimizations in C gah4@u.washington.edu (gah4) (2023-01-18)
Re: Undefined Behavior Optimizations in C alexfrunews@gmail.com (Alexei A. Frounze) (2023-01-19)
Re: Undefined Behavior Optimizations in C gah4@u.washington.edu (gah4) (2023-01-20)
Re: Undefined Behavior Optimizations in C tkoenig@netcologne.de (Thomas Koenig) (2023-01-20)
Re: Undefined Behavior Optimizations in C Keith.S.Thompson+u@gmail.com (Keith Thompson) (2023-01-20)
Re: Undefined Behavior Optimizations in C anton@mips.complang.tuwien.ac.at (2023-01-21)
[7 later articles]

| List of all articles for this month |

From:	David Brown <david.brown@hesbynett.no>
Newsgroups:	comp.compilers
Date:	Wed, 18 Jan 2023 21:14:44 +0100
Organization:	A noiseless patient Spider
References:	23-01-027 <sympa.1673343321.1624.383@lists.iecc.com> 23-01-031 23-01-041 23-01-062
Injection-Info:	gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="83512"; mail-complaints-to="abuse@iecc.com"
Keywords:	C, optimize
Posted-Date:	18 Jan 2023 18:55:44 EST
In-Reply-To:	23-01-062
Content-Language:	en-GB

On 18/01/2023 14:14, Spiros Bousbouras wrote:
> On Wed, 11 Jan 2023 14:20:49 +0100
> David Brown <david.brown@hesbynett.no> wrote:
>> C was designed from day one to be a high-level language, not an
>> assembler of any sort. Limitations of weaker earlier compilers does
>> not mean the language was supposed to work that way.
>
> For those who want an abstract or portable assembler , there exists
> c9x.me/compile/ .I've never used it but at least it aims to be that ,
> unlike C. I would be curious to know of other analogous projects. I
> guess the "register transfer language" of GCC is somewhat analogous.

I haven't looked at that projects - but as a general point, I am
sceptical to any claims about "portable assembler". If there is
translation and it is not one-to-one (or very close to that), then you
don't really have "assembler" even if you have a rather low-level
language. (And gcc's RTL is an internal format - usually there are
several optimisation passes done at the RTL level.)

>
>> I first used a C compiler that optimised on the assumption that UB
>> didn't happen some 25 years ago. (In particular, it assumed signed
>> integer arithmetic never overflowed.)
>
> I have encountered several times the claim that compilers assume that UB does
> not happen and I don't understand it. Lets consider 2 examples :
>
> x + 1 > x
>
> in C where x is a signed integer. Compilers will often treat this as
> always true with the following reasoning :
>
> - if x does not have the maximum value which fits in its type then the
> meaning of the C expressions is the same as their mathematical meaning
> so the expression evaluates to true.
>
> - if x has the maximum value which fits in its type then x + 1 is not
> defined so any translation (including treating the whole expression as
> true) is valid.
>
> There's no assumption that UB (undefined behaviour) will not happen, both
> possibilities are accounted for.
>

I think I see what you are saying, but I don't make a big distinction
between "assumes UB does not happen", "assumes you don't care about
results if UB /does/ happen" and "can make any transformations if UB
happens".

One thing that you might view as a distinction is that compilers can
use their knowledge of UB to affect surrounding code.

So if you have :

int x, y;

if (x + 1 > x) y++; // (a)
if (x == INT_MAX) y = 10; // (b)

From your example above, we can see that the compiler can transform (a)
into "y++;" - there is no need for the conditional. But the compiler
can /also/ transform (b) into ";" - it is allowed to reason that if x
/were/ equal to INT_MAX, statement (a) would be undefined behaviour
(even though it was transformed away) and there is no value for x which
would result in "y = 10" being executed without also executing UB.

(A quick check on <https://godbolt.org> shows that gcc does the first
transformation, but not the second one.)

> Another example is
>
> ... *some_pointer_object ...
> [ some_pointer_object does not get modified in this part of the code and
> has not been declared as volatile ]
> if (some_pointer_object == NULL) ...
>
> If some_pointer_object is not NULL then the test can be omitted ; if it is
> NULL then the earlier dereference is UB so any translation is valid including
> omitting the test.
>
> Again, there's no assumpion that UB will not happen.

I think that is one way to look at it, but really it comes down to the
same thing.

One thing that is worth noting in this context is that compilers like
gcc and clang translate known undefined behaviour into a special marker.
You can imagine it as translating "*p = ..." into :

if (!p) undefined_behaviour();
*p = ...

And the builtin function __builtin_unreachable() is translated into
exactly the same internal marker or tree node type. These compilers do
not distinguish between "undefined behaviour" and "code flow cannot
get here".

>
> So the request that C compilers should stop assuming that UB will not
> happen seems to me completely misguided. I think what is really meant
> is that, in reasoning what a valid translation is, C compilers (or
> the authors of the compilers) should not employ the notion of UB. But
> then how should UB be translated ? Again there exists the assumption
> or claim that there is some intuitively obvious translation and
> compilers should go for that. First, I'm not sure that there exists
> such a common intuition even among humans and second, even if it does
> , how does one go from an intuition to an algorithm C compilers can
> use to do translation ? Lots of things are intuitively obvious but
> creating an algorithm to duplicate the human intuition is a hard
> problem, one which has not been solved in many cases and perhaps even
> one which is unsolvable in some cases.
>

I agree entirely with your assessment, with the exception that
"compilers can and do assume UB doesn't happen" is a valid way to view
things.

(I'm snipping the rest, because I fully agree - and it is so well
written that I've nothing to add!)

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Undefined Behavior Optimizations in C

David Brown <david.brown@hesbynett.no>Wed, 18 Jan 2023 21:14:44 +0100

David Brown <david.brown@hesbynett.no>
Wed, 18 Jan 2023 21:14:44 +0100