Re: Undefined Behavior Optimizations in C

Spiros Bousbouras <spibou@gmail.com>
Wed, 18 Jan 2023 13:14:35 -0000 (UTC)

          From comp.compilers

Related articles
[17 earlier articles]
Re: Undefined Behavior Optimizations in C gah4@u.washington.edu (gah4) (2023-01-11)
Re: Undefined Behavior Optimizations in C 864-117-4973@kylheku.com (Kaz Kylheku) (2023-01-12)
Re: Undefined Behavior Optimizations in C Keith.S.Thompson+u@gmail.com (Keith Thompson) (2023-01-12)
Re: Undefined Behavior Optimizations in C tkoenig@netcologne.de (Thomas Koenig) (2023-01-12)
Re: Undefined Behavior Optimizations in C antispam@math.uni.wroc.pl (2023-01-13)
Re: Undefined Behavior Optimizations in C 864-117-4973@kylheku.com (Kaz Kylheku) (2023-01-15)
Re: Undefined Behavior Optimizations in C spibou@gmail.com (Spiros Bousbouras) (2023-01-18)
Re: Undefined Behavior Optimizations in C david.brown@hesbynett.no (David Brown) (2023-01-18)
Re: Undefined Behavior Optimizations in C gah4@u.washington.edu (gah4) (2023-01-18)
Re: Undefined Behavior Optimizations in C alexfrunews@gmail.com (Alexei A. Frounze) (2023-01-19)
Re: Undefined Behavior Optimizations in C gah4@u.washington.edu (gah4) (2023-01-20)
Re: Undefined Behavior Optimizations in C tkoenig@netcologne.de (Thomas Koenig) (2023-01-20)
Re: Undefined Behavior Optimizations in C Keith.S.Thompson+u@gmail.com (Keith Thompson) (2023-01-20)
[8 later articles]
| List of all articles for this month |

From: Spiros Bousbouras <spibou@gmail.com>
Newsgroups: comp.compilers
Date: Wed, 18 Jan 2023 13:14:35 -0000 (UTC)
Organization: Aioe.org NNTP Server
References: 23-01-027 <sympa.1673343321.1624.383@lists.iecc.com> 23-01-031 23-01-041
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="85677"; mail-complaints-to="abuse@iecc.com"
Keywords: C, optimize
Posted-Date: 18 Jan 2023 11:35:37 EST

On Wed, 11 Jan 2023 14:20:49 +0100
David Brown <david.brown@hesbynett.no> wrote:
> C was designed from day one to be a high-level language, not an
> assembler of any sort. Limitations of weaker earlier compilers does
> not mean the language was supposed to work that way.


For those who want an abstract or portable assembler , there exists
c9x.me/compile/ .I've never used it but at least it aims to be that ,
unlike C. I would be curious to know of other analogous projects. I
guess the "register transfer language" of GCC is somewhat analogous.


> I first used a C compiler that optimised on the assumption that UB
> didn't happen some 25 years ago. (In particular, it assumed signed
> integer arithmetic never overflowed.)


I have encountered several times the claim that compilers assume that UB does
not happen and I don't understand it. Lets consider 2 examples :


        x + 1 > x


in C where x is a signed integer. Compilers will often treat this as
always true with the following reasoning :


- if x does not have the maximum value which fits in its type then the
    meaning of the C expressions is the same as their mathematical meaning
    so the expression evaluates to true.


- if x has the maximum value which fits in its type then x + 1 is not
    defined so any translation (including treating the whole expression as
    true) is valid.


There's no assumption that UB (undefined behaviour) will not happen, both
possibilities are accounted for.


Another example is


      ... *some_pointer_object ...
      [ some_pointer_object does not get modified in this part of the code and
          has not been declared as volatile ]
      if (some_pointer_object == NULL) ...


If some_pointer_object is not NULL then the test can be omitted ; if it is
NULL then the earlier dereference is UB so any translation is valid including
omitting the test.


Again, there's no assumpion that UB will not happen.


So the request that C compilers should stop assuming that UB will not
happen seems to me completely misguided. I think what is really meant
is that, in reasoning what a valid translation is, C compilers (or
the authors of the compilers) should not employ the notion of UB. But
then how should UB be translated ? Again there exists the assumption
or claim that there is some intuitively obvious translation and
compilers should go for that. First, I'm not sure that there exists
such a common intuition even among humans and second, even if it does
, how does one go from an intuition to an algorithm C compilers can
use to do translation ? Lots of things are intuitively obvious but
creating an algorithm to duplicate the human intuition is a hard
problem, one which has not been solved in many cases and perhaps even
one which is unsolvable in some cases.


I've seen the suggestion that compilers should describe their behaviour in
terms of assembly generated (possibly some kind of abstract assembly) as
opposed to higher terms. I'm not sure if this is possible and, even if it is,
I would not find it useful. I tend to think of what I want my code to do in
higher terms and then bring it down to the level of the language with
successive refinements. If parts of C were described in assembly terms then
it would potentially force me to do at least 1 more refinement step with no
benefit.


A more productive avenue is for people to give definitions, as precise as
possible, to the kinds of UB which has caused them problems and then try to
convince compiler writers to implement such extensions if they don't do so
already. In this area even compiler documentation should perhaps improve. For
example, from the GCC man page


      -fdelete-null-pointer-checks
              Use global dataflow analysis to identify and eliminate useless
              checks for null pointers. The compiler assumes that
              dereferencing a null pointer would have halted the program. If
              a pointer is checked after it has already been dereferenced, it
              cannot be null.


              In some environments, this assumption is not true, and programs
              can safely dereference null pointers. Use
              -fno-delete-null-pointer-checks to disable this optimization for
              programs which depend on that behavior.


.The above still doesn't tell me what is supposed to happen when a NULL pointer
is dereferenced even with the -fno-delete-null-pointer-checks flag. I'm
guessing it's impossible to give a general definition. One can in specific
systems but in general no so perhaps the above description does the best
possible.


Another example


      -fstrict-overflow
              Allow the compiler to assume strict signed overflow rules,
              depending on the language being compiled. For C (and C++) this
              means that overflow when doing arithmetic with signed numbers is
              undefined, which means that the compiler may assume that it will
              not happen.


This is poor phrasing, in particular the part "which means that the
compiler may assume that it will not happen" is redundant. There is no
reason for the compiler to assume anything about which execution paths will
happen during runtime to conclude for example that x + 1 > x can be
translated as true. The above quote gives an unnecessarily circuitous
reasoning as to why the expression can be translated as true. I give a more
direct reasoning above.


> It annoys /me/ intensely that people complain about this sort of thing,
> and yet apparently haven't bothered to read the compiler manuals to see
> how to get the effects they want. Compile with "-fno-strict-aliasing",
> or (better, IMHO) add this to your code:
>
> #pragma GCC optimize ("-fno-strict-aliasing")
>
> Now, if you want to complain that the gcc documentation is not great,


Yeah, it would be good if there was a more precise specification as to what
additional guarantees beyond the C standard this gives. For translating other
languages into C, this seems to be important for achieving object allocation
and garbage collection since relying on the native malloc() and related is
generally not adequate, at least not if your garbage collector is allowed to
move objects.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.