Re: behavior-preserving optimization in C, was compiler bugs

David Thompson <>
Mon, 15 Jun 2009 06:51:48 GMT

          From comp.compilers

Related articles
[27 earlier articles]
Re: behavior-preserving optimization in C, was compiler bugs (George Neuner) (2009-05-24)
Re: behavior-preserving optimization in C, was compiler bugs (Hans-Peter Diettrich) (2009-05-25)
Re: behavior-preserving optimization in C, was compiler bugs (2009-05-25)
Re: behavior-preserving optimization in C, was compiler bugs (Ian Lance Taylor) (2009-05-25)
Re: behavior-preserving optimization in C, was compiler bugs (George Neuner) (2009-05-25)
Re: behavior-preserving optimization in C, was compiler bugs (Hans-Peter Diettrich) (2009-05-29)
Re: behavior-preserving optimization in C, was compiler bugs (David Thompson) (2009-06-15)
| List of all articles for this month |

From: David Thompson <>
Newsgroups: comp.compilers
Date: Mon, 15 Jun 2009 06:51:48 GMT
Organization: Poor
References: 09-04-072 09-04-086 09-05-010 09-05-022 09-05-028 09-05-038 09-05-039 09-05-050 09-05-055 09-05-065 09-05-069 09-05-073 09-05-087 09-05-110
Keywords: linker, C
Posted-Date: 15 Jun 2009 10:27:51 EDT

On Sun, 24 May 2009 04:55:55 -0400, George Neuner
<> wrote:

> On Tue, 19 May 2009 14:42:40 -0400, George Neuner
> <> wrote:
> >On Sat, 16 May 2009 08:39:28 +0100, Nathaniel McIntosh
> ><> wrote:
> >
> >>
> >> foo.c bar.c
> >> ----- -----
> >> ... ...
> >> double x; int x = 0;
> >> ... int garbage;
> >>
> >>Most C compilers will compile these two modules without complaint;
> >>when you link foo.o and bar.o into an executable, the "strong"
> >>definition of "x" from bar.o is favored over the "weak" definition in
> >>foo.o, and you wind up with a final "x" entity that is 4 bytes in size
> >>(assuming an ILP32 compilation model), not 8 bytes.
> >>
> >>In spite of the fact that foo.c contains functions that store 8-byte
> >>values to "x", the program works without optimization because the
> >>variable "garbage" (unused as it turned out) happens to be allocated
> >>just after "x" in memory. If the optimizer plays with the storage
> >>allocation such that some other critical variable appears just after
> >>"x", then the application fails.
> >
> >No. Your example will work regardless. Assuming those definitions
> >were global in the source files, the result will be 2 different
> >variables named 'x' - an integer in bar.o and a double real in foo.o.
> >

Not so. There are indeed two different definitions of objects named x,
(both) with external linkage, meaning they should be the same object.
The C standard does not specify what happens, and I don't know of any
implementation that allocates two (although that wouldn't violate the
non-specification). I've seen some give an error, some take the first
encountered, some take the 'strong' (initialized) one as PP said, some
take the nonzero-initialized one if any (none in this example).

If both (or at least one) declaration had 'static', then they would
(definitively) be separate, and accesses to each correct; see below.

> >If, OTOH, the code was:
> >
> > foo.c bar.c
> > ----- -----
> > ... ...
> > extern double x; int x = 0;
> > ... int garbage;
> > ...
> >
> >then the compiler/linker would create only the integer "x" and any
> >functions in foo that accessed "x" as a double real would do so in
> >error. ...

> I tested them by compiling the following code (in 32-bits) in both
> debug and optimized versions.
> ==== test.c ====
> extern void DoStuffWithIntX( void );
> extern void DoStuffWithDblX( void );
> int main(int argc, char* argv[])
> {
> DoStuffWithDblX( );
> DoStuffWithIntX( );
> return 0;
> }
> ==== foo.c ====
> #include <stdio.h>
> double x;
> void DoStuffWithDblX( void )
> {
> printf( "double X is at %p\n", &x );
> }
> ==== bar.c ====
> #include <stdio.h>
> int x;
> int garbage;
> void DoStuffWithIntX( void )
> {
> printf( "integer X is at %p\n", &x );
> }

Nit: %p is specified to work for void*. The C standard allows
different pointer types to have different representations including
sizes, and then this needs a cast. There have been such machines in
the past, but nowadays practically all machines -- at least ones with
hosted C and thus stdio -- are byte addressed and at least all object
(data) pointers are the same and this 'happens' to work.

> ...
> C does not allow either duplicate definitions or multiple definitions
> of the same named object in any name overloading class (of which the
> "top level", ie. global names, is one). You can have duplicated
> _declarations_, but not duplicated _definitions_.

You can't have multiple definitions in the same scope. You can have
multiple declarations for static duration objects, which includes all
at file-scope aka top-level. (Global is ambiguous; people sometimes
use it to mean throughout a translation unit, sometimes over the whole
program.) Any declaration of an automatic variable is a definition (so
there can be only one). Any declaration of a typedef is like a
definition (although the standard doesn't call it that) so ditto in C;
but in C++ you can 'benignly' redeclare/define a typedef. You can have
only one declaration of the contents of a given struct or union type.

And discussion can be confused by the fact that the _syntactic_
construct 'declaration' is used for both declarations and definitions
of objects, declarations of types (there are no definitions), and
declarations of functions (definitions use a different syntax).
(Definitions and declarations of _objects of_ struct or union type
follow the rules for objects in general; this constrains whether you
can syntactically combine the declarations for types and objects.)

Remaining examples assume file-scope; block-scope is mostly different.

> "extern double x;" is a declaration. No storage is allocated
> for the variable, its name is a reference
> to a definition elsewhere.


> "double x = 0.0;" is a definition. This allocates storage
> for the variable. Because it is initialized,
> this is considered a "strong" definition.

Right, except that there is no 'strong' or 'weak' in the standard.
Some implementations distinguish them, and may distinguish initializer
explicit or not (see below), or nonzero or zero. And note that extern
double x = 0.0; is also a definition.

> "double x;" is a definition if there is a preceeding
> declaration, else it is both. Because it
> is _not_ initialized, this is considered
> a "weak" definition.
Not really. This is a 'tentative definition'; if and only if there is
no with-initializer definition elsehwere in the translation unit ~
source file, the compiler synthesizes one with an initializer of zero.
(Which in C means integer zero, floating zero, or null pointer, as
applicable.) That synthesized definition may well be 'weak'.

> "static double x;" is a declaration if there is a subsequent
> strong definition, else it is both. This
> allocates storage for the variable.

This is also a tentative definition, but in addition it gives the name
'internal linkage'; that means this name is not accessible to other
translation units. If another t.u. has a reference 'extern T x;' that
ISN'T resolved to this x. (The object itself, however, is accessible
if this t.u. returns or otherwise makes available a pointer to it.)
In implementation terms, this allocates space with no linker name,
usually as a single anonymous block for the whole t.u. It may have an
info/debugging-only name something like module$x or #nomatch.x.

Again a tentative definition turns into a definition with value zero
if there is not a with-initializer one in the same t.u.

Or to lay out the cases:

    static double x = 0; is a definition with internal linkage; there
can be no other definition of x at file-scope in this t.u., but there
can be a *distinct* internal='static' x in every other t.u. and an
external=global='extern' one in some (one) other t.u.

    static double x; is a tentative definition; if there is not a
definition with initializer elsewhere in this t.u., there is
implicitly one with value zero. And all else as just above.

> A problem occurs when there are multiple definitions in separate
> files. Technically, by the standards, there can be only 1 definition
> of a name at top level ... all other references to the name, including
> any forward references to the name in the same file, *must* be
> "extern" declarations.

Only one _with external linkage_. Which is what people often, but
definitely not always, mean when they say 'global'.

> However, the requirement for specifying "extern" is routinely relaxed
> for forward references to functions and to recursive structures. For
> such declarations, compilers implicitly assume "extern". It appears
> that some compilers also assume "extern" in the case of weak variable
> definitions.

Functions use a syntactic distinction.

/*extern*/ int foo (whatever) /* no initializer possible */ ;
is a declaration and references a definition elsewhere.
/*extern*/ int foo (whatever) { body } is a definition
that can be referenced. 'extern' is optional=default.

static int foo (whatever) ; and
static int foo (whatever) { body }
have internal linkage; this foo is not accessible by name elsewhere.
(Again a pointer could be provided by other, explicit means.)

Types don't exist at runtime at all. You can write the same typedef
and struct, union and enum declarations in multiple translation units
(typically by #include'ing the same .h file) but you formally get
separate _compatible_ types, not the same types.

> Now the problem with assuming "extern" for weak definitions is that it
> in conflict with the assumption of file scope visibility. Judging by
> the C standard, a top level variable definition (strong or weak) is
> not meant to name an external object, but rather to introduce a
> variable with file scope ... a separate "extern" declaration is needed
> to "import" the name into a different scope. In keeping with the
> principle of least surprise, it would be better to assume a weak
> variable definition is "conditionally static" rather than "extern",
> conditionally allocate storage for it and have the linker resolve the
> references. Some of the compilers actually did something like this
> and created 2 separate variables.

I'm not sure quite what you wanted/preferred here, but the irregular
and implementation-dependent specification in standard C is because
there were already variations across implementations before C89 that
had to be accommodated or there would have been no standard at all,
and now we're basically stuck with their designs. Quite a few of them
were originally designed to coexist with or even share FORTRAN COMMON,
which allowed _zero or_ one definition. C++ eliminated the 'tentative
definition' crock, but kept (all?) the rest, and added namespaces.
(And overloading, but that just gives finer granularity as to which
functions are 'the same' and doesn't affect objects.)

Hint: if your compiler/linker docs/whatever mention 'common', it's
likely that multiple t.u.s with tentative->synthesized definitions,
and possible with explicit-to-zero ones, will 'merge'. For gcc in
particular, at least some versions on some targets, -fno-common will
make a big difference.

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.