Re: Optimization techniques and undefined behavior

David Brown <david.brown@hesbynett.no>
Tue, 30 Apr 2019 15:48:31 +0200

From comp.compilers

Related articles
[3 earlier articles]
Re: Optimization techniques and undefined behavior david.brown@hesbynett.no (David Brown) (2019-04-29)
Re: Optimization techniques and undefined behavior bc@freeuk.com (Bart) (2019-04-29)
Re: Optimization techniques and undefined behavior david.brown@hesbynett.no (David Brown) (2019-04-29)
Re: Optimization techniques and undefined behavior auriocus@gmx.de (Christian Gollwitzer) (2019-04-29)
Re: Optimization techniques and undefined behavior bc@freeuk.com (Bart) (2019-04-29)
Re: Optimization techniques and undefined behavior david.brown@hesbynett.no (David Brown) (2019-04-30)
*Re: Optimization techniques and undefined behavior david.brown@hesbynett.no (David Brown)* (2019-04-30)**
Re: Optimization techniques and undefined behavior bc@freeuk.com (Bart) (2019-05-01)
Re: Optimization techniques and undefined behavior bc@freeuk.com (Bart) (2019-05-01)
Re: Optimization techniques and undefined behavior anw@cuboid.co.uk (Andy Walker) (2019-05-02)
Re: Optimization techniques and undefined behavior martin@gkc.org.uk (Martin Ward) (2019-05-02)
Re: Optimization techniques and undefined behavior david.brown@hesbynett.no (David Brown) (2019-05-02)
Re: Optimization techniques and undefined behavior david.brown@hesbynett.no (David Brown) (2019-05-02)
[24 later articles]

| List of all articles for this month |

From:	David Brown <david.brown@hesbynett.no>
Newsgroups:	comp.compilers
Date:	Tue, 30 Apr 2019 15:48:31 +0200
Organization:	A noiseless patient Spider
References:	<72d208c9-169f-155c-5e73-9ca74f78e390@gkc.org.uk> 19-04-021 19-04-023 19-04-037 19-04-039 19-04-042 19-04-045
Injection-Info:	gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="92079"; mail-complaints-to="abuse@iecc.com"
Keywords:	optimize, design
Posted-Date:	30 Apr 2019 22:20:40 EDT

On 29/04/2019 19:15, Bart wrote:
> On 29/04/2019 16:08, David Brown wrote:
>> On 29/04/2019 01:31, Bart wrote:
>
>>> then you don't want the compiler being clever about overflow.
>>
>> I /do/ want a result consistent with a single expression, or splitting
>> up the expression.
>
> Then the choice is between both ways giving you 1500000000, or both
> giving you -647483648.

Let me repeat - I do not care what the results are here. I don't care
if they are consistent with each other. I don't care if they change
between runs of the compiler. I don't care if the result is a pink
umbrella.

I care what I get from "x * 2 / 2" when I give a sensible, valid value
of x. The range of valid values depends on the type of x, of course.

> The former is going to be difficult, since the intermediate 32-bit value
> has lost some information. The latter is very easy, and involves dumping
> the UB nonsense.

Giving "don't care" results is even easier.

>> Questions about what the compiler will do with overflows, like how
>> consistent it will be, are as sensible as asking how many miles per
>> gallon you get from your car when it has no tires. You would not drive
>> your car without tires - that would be a mistake, a bug in your driving
>> procedure. I don't write signed integer expressions that overflow -
>> barring bugs in my coding. And thus I don't care what the compiler does
>> about them, and have no interest in their consistency.
>
> If the gcc people designed cars, either the car wouldn't have an engine
> because, since you're always going to end up at your start point,
> there's no point in driving it; or it wouldn't have any brakes since you
> are never going to have an accident.

No, if the gcc people designed cars they would work according to
specification. So if they designed a car for a maximum of 5 people,
they would only include 5 seat belts. Anyone exceeding those
specifications does so on their own responsibility - the car
manufacturer says nothing about what will happen if you try to squeeze
in a sixth person. Typical results will be "best effort", but it could
also mean the car won't move, or that it will be more dangerous in an
accident.

But if the "signed integers should wrap" crowd built a car for five
people, it would come with a guarantee that squeezing in an extra person
would leave the car with 6 people-shaped holes. But hey, it's
guaranteed behaviour, and that's clearly more important than common sense.

>
>> I want the compiler to give me the right answer to valid questions - I
>> don't expect it to give me any consistent answer to invalid questions.
>
> What is the question? Hint: it's not the result of 1500000000*2/2, it's
> the result of 1500000000*2/2 when the 1500000000 is represented as a
> 32-bit twos complement binary value, and intermediate calculations are
> done to the same precision.
>

That is not the question - not in C, nor in normal mathematics, nor
common simple arithmetic, nor in modelling almost any situation in the
real world that you might be handling in code. If that /were/ the
question, then it would not be meaningless and it would have a single
correct answer. But it is not the question.

>>> but I've just
>>> tried 20 or so combinations of compilers and optimise flags, all give a
>>> result of -647483648 - except gcc which gave 1500000000. And even gcc
>>> gave -647483648 with some versions and some optimisation levels.
>>>
>>
>> Do you understand what "optimising compiler" means? It means the
>> compiler should try to give you code that runs as efficiently as
>> possible given valid inputs. C does not impose any requirements on code
>> efficiency, but compiler users certainly do - so a C compiler is not
>> going to go out of its way to give poorer quality code. So given "x * 2
>> / 2;", a compiler will do one of two things - return "x" unchanged, or
>> carry out the operations using the most obvious assembly instructions.
>> A good compiler will thus give you 1500000000 in this case, as that is
>> the most efficient implementation consistent with the source code.
>
> And so it will be inconsistent with (in my tests) most other compilers.
> My tests were done both with optimisation and without. clang-O3,
> gcc81-O3, gcc81-03, and gcc51-O3 gave 1500000000.
>
> All other compilers I tried, including VC, clang-O0 and gcc51-O0, gave
> the -647483648 figure. As would my own compilers for other languages (if
> using int32, but they now use int64 and the same behaviour is observed
> when x is 6000000000000000000).
>

I have used a number of other compilers that don't consider signed
overflow to be defined behaviour. And I have used many more that
/sometimes/ do optimisations that depend on it being undefined. This
is, of course, perfectly reasonable - since it is undefined, there is
nothing hindering it from being wrapping on occasion. This makes a
mockery of your tests here - just because some compilers sometimes give
a result that implies they have done two's complement wrapping, does not
mean they /always/ do so - unless they specifically say so in their
documentation.

As for your optimisation flags - here is a hint about C programming. If
optimisation flags affect the results of your code, rather than just the
timing, your code is wrong.

>> It is not a correct answer for standard C signed arithmetic, because
>> there is no correct answer.
>
> This is the nub of the issue: *C* has decided that such arithmetic is
> undefined.

I'm sorry, I thought this discussion was about C. If the C standards
say this is how integer arithmetic works in C, then that is how signed
integer arithmetic works in C. It doesn't matter how it works in D, E
or F - those don't affect how it works in C.

> But this is exactly the same 32-bit operation that can be
> done in a dozen other languages, probably on most machines that support
> 32-bit multiply, and most do not make it undefined.
>
> So it is largely a peculiarity of C.
>

Every language has its peculiarities. That might make them different,
it does not make them wrong.

For many languages, overflowing integers throws some kind of error or
exception - it is /not/ defined as modulo arithmetic. This is much like
C, except that in C you are expected to make checks for errors yourself,
rather than rely on the language to do it - this principle is a major
reason why C is used when top efficiency is required. It is also a
reason why C is not a good choice of language for the careless or lazy
programmer, the amateur programmer, the wilfully ignorant programmer who
refuses to learn the language, or for the programmer who wants to get a
lot done with a little development work. (I have always held that most
C programmers should not be programming in C - and most programs written
in C should not be written in C.)

>
> It is not a correct answer in normal
>> mathematics, or almost any real-world problem you might want to model.
>> It is, however, correct if you have defined your signed arithmetic to be
>> wrapping. It is fine - but IMHO almost entirely useless and
>> counter-productive - for a programming language to define signed
>> arithmetic in that way. C does not define it that way, but other
>> languages (and particular C compilers) can do so.
>
> This is contradictory - so a C compiler can choose to make something
> that C as deemed undefined behaviour, defined?
>

Yes. How long do you claim to have been working with C ?

When you compile a C program, the behaviour for any part of the code can
be defined by the C standards (whichever one you use), the
implementation, and additional standards or information for the system,
OS, ABI, libraries, APIs, etc., in use. The fact that the C standards
leave signed integer overflow undefined behaviour means that an
implementation can treat it as undefined - and assume it won't happen,
or that the programmer doesn't care what happens if the UB /does/ occur.
But it does not in any way limit the C compiler from giving its own
definition - whether that be to wrap results, throw up an error message,
or launch nasal demons.

>>> It is certainly what you might expect on such hardware.
>>>
>>>> Why do you think a guaranteed wrong and meaningless answer is
>>>> better than undefined behaviour?
>>>
>>> Is it really meaningless? Try the above using x=x*2. It will still
>>> overflow and produce a result of -1294967296, apparently incorrect. But
>>> print the same bit-pattern using "%u" format, and you get 3000000000 -
>>> the right answer. You can predict what's going to happen, /if/ you can
>>> predict what a compiler is going to do. Unfortunately with ones like
>>> gcc, you can't.
>>
>> Again, it does not matter if you can predict what value you get if
>> something is nonsensical. It matters that you can predict what values
>> are valid inputs, of course, but not what the outputs are when the
>> inputs are invalid. When garbage goes in, I don't care what colour of
>> garbage comes out - if I care what is going to come out, I am careful
>> about what I put in.
>
> Sorry, but to me a result of 1500000000 would be garbage, as it is
> highly misleading. If I didn't intend 1500000000*2/2 to overflow, but
> the result was a perfect 1500000000, how would I know there was a bug?

Sure, 1500000000 is garbage - you've put garbage in, and get garbage
out. But /any/ result would be garbage. I'd prefer the compiler to
give the implementation that gave me correct outputs for correct inputs,
rather than wasting cycles to give me rainbow coloured garbage for
inputs I would never use because they are invalid.

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Optimization techniques and undefined behavior

David Brown <david.brown@hesbynett.no>Tue, 30 Apr 2019 15:48:31 +0200

David Brown <david.brown@hesbynett.no>
Tue, 30 Apr 2019 15:48:31 +0200