Re: Undefined Behavior Optimizations in C

David Brown <david.brown@hesbynett.no>
Wed, 11 Jan 2023 14:40:30 +0100

          From comp.compilers

Related articles
[9 earlier articles]
Re: Undefined Behavior Optimizations in C 864-117-4973@kylheku.com (Kaz Kylheku) (2023-01-09)
Re: Undefined Behavior Optimizations in C 864-117-4973@kylheku.com (Kaz Kylheku) (2023-01-09)
Re: Undefined Behavior Optimizations in C david.brown@hesbynett.no (David Brown) (2023-01-10)
Re: Undefined Behavior Optimizations in C gah4@u.washington.edu (gah4) (2023-01-10)
Re: Undefined Behavior Optimizations in C tkoenig@netcologne.de (Thomas Koenig) (2023-01-11)
Re: Undefined Behavior Optimizations in C david.brown@hesbynett.no (David Brown) (2023-01-11)
Re: Undefined Behavior Optimizations in C david.brown@hesbynett.no (David Brown) (2023-01-11)
Re: Undefined Behavior Optimizations in C gah4@u.washington.edu (gah4) (2023-01-11)
Re: Undefined Behavior Optimizations in C 864-117-4973@kylheku.com (Kaz Kylheku) (2023-01-12)
Re: Undefined Behavior Optimizations in C Keith.S.Thompson+u@gmail.com (Keith Thompson) (2023-01-12)
Re: Undefined Behavior Optimizations in C tkoenig@netcologne.de (Thomas Koenig) (2023-01-12)
Re: Undefined Behavior Optimizations in C antispam@math.uni.wroc.pl (2023-01-13)
Re: Undefined Behavior Optimizations in C 864-117-4973@kylheku.com (Kaz Kylheku) (2023-01-15)
[13 later articles]
| List of all articles for this month |

From: David Brown <david.brown@hesbynett.no>
Newsgroups: comp.compilers
Date: Wed, 11 Jan 2023 14:40:30 +0100
Organization: A noiseless patient Spider
References: 23-01-009 23-01-011 23-01-012 23-01-017 23-01-027 23-01-032 23-01-035
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="96511"; mail-complaints-to="abuse@iecc.com"
Keywords: C, standards
Posted-Date: 11 Jan 2023 18:13:05 EST
In-Reply-To: 23-01-035
Content-Language: en-GB

On 11/01/2023 00:57, gah4 wrote:
> On Tuesday, January 10, 2023 at 2:00:57 PM UTC-8, David Brown wrote:
>
> (big snip)
>
>> It is particularly harmful when programmers think there is such a thing
>> as "de facto defined". That's an oxymoron. If the behaviour is
>> defined, it is defined. If it is not defined, it is not defined. If it
>> is not defined and a programmer makes unwarranted and incorrect
>> assumptions about what they think it means, then the programmer needs to
>> update his or her understanding of the language. They don't get to
>> blame the compiler or the compiler writer for not making the same
>> unfounded assumptions that they did.
>
> A lot of C code assumes two's complement. I believe Unisys still sells
> ones' complement hardware, and so that code might have UB, but often
> people know that it will only run on two's complement machines.
>
> Much also assumes ASCII code. Don't tell IBM about that.


There is usually no requirement for a given piece of C code to be fully
portable to all conforming compilers. Most C code is quite limited in
the scope of its use, and it is quite reasonable to assume two's
complement representation (though /not/ two's complement wrapping on
overflow), 8-bit char, ASCII characters, etc. It may be fine to assume
32-bit int, and perhaps little-endian ordering. These are /warranted/
assumptions, not unwarranted ones - they are typically reasonable to
make, and if you want to you can sometimes put compile-time checks so
that if someone does try to use it out of context, they get an error
message.


And these are all points that the C standards call
implementation-dependent behaviour. These may vary between compilers or
targets, but for a given toolchain and target the behaviour is clear and
defined. That is quite different from undefined behaviour.


>
> If C compilers had a fatal compilation error when they found
> suspicious UB, I could live with that. Maybe it means doing
> something to convince the compiler, even if it really is still UB.
>


C compilers often do their best here, as long as you enable warnings and
optimisation (needed for the code analysis). I'd prefer it if tools
like gcc had a lot more warnings by default. However, it is rarely
possible to find run-time UB at compile time. If a function takes a
pointer parameter and dereferences it, it is only UB if the function is
called with an invalid pointer. The compiler will assume that UB does
not occur, and compile the function optimised with that assumption. If
the compiler were to warn you about the potential UB in the function,
your builds would be swamped by false positives.


You can come a long way be using sanitizers when testing code - then
your code is augmented by extra checks, and the run-time will stop with
a fatal error when there are problems. I think such tools are often
under-used by developers (as are static warnings). Of course no testing
can prove code is correct, but it is certainly a help.


> Remember, much of the data breaches we hear about
> (and also the ones we don't) are due to buffer overflow
> in C programs. Make it easier, instead of harder, for programmers
> to avoid buffer problems.
>


Sure. The prime solution here is to stop using C - use a language that
gives you more help in such cases. That could be a high level language
such as Python, a managed language like C# or Java, or a smarter
language like Rust or C++.


If you have to use C, then write /better/ C and do better testing. Stop
passing raw pointers around independently of the size of the data. Stop
using fixed size "magic numbers" for the size of buffers when the size
needed might change later. Refactor your data accesses into small
inline functions, where you can easily add tests and limits during
debugging.


Yes, C makes it easy to write incorrect code - but it is quite possible
to use it for writing correct code. There are other languages that make
it easier to write correct code and harder to write incorrect code -
it's too late for C to change.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.