Re: How do linkers deal with C++ duplicate code?

mrs@kithrup.com (Mike Stump)
22 Aug 1998 23:35:48 -0400

          From comp.compilers

Related articles
[4 earlier articles]
Re: How do linkers deal with C++ duplicate code? dlmoore@molalla.net (David L Moore) (1998-08-22)
Re: How do linkers deal with C++ duplicate code? dwight@pentasoft.com (1998-08-22)
Re: How do linkers deal with C++ duplicate code? stes@mundivia.es (David Stes) (1998-08-22)
Re: How do linkers deal with C++ duplicate code? saroj@bear.com (1998-08-22)
Re: How do linkers deal with C++ duplicate code? jacob@jacob.remcomp.fr (1998-08-22)
Re: How do linkers deal with C++ duplicate code? ian@cygnus.com (1998-08-22)
Re: How do linkers deal with C++ duplicate code? mrs@kithrup.com (1998-08-22)
Re: How do linkers deal with C++ duplicate code? mrs@kithrup.com (1998-08-22)
Re: How do linkers deal with C++ duplicate code? bowdidge@watson.ibm.com (Robert Bowdidge) (1998-08-24)
Re: How do linkers deal with C++ duplicate code? joachim.durchholz@munich.netsurf.de (Joachim Durchholz) (1998-08-25)
Re: How do linkers deal with C++ duplicate code? dlmoore@molalla.net (David L Moore) (1998-08-30)
Re: How do linkers deal with C++ duplicate code? zalman@netcom.com (1998-08-31)
| List of all articles for this month |

From: mrs@kithrup.com (Mike Stump)
Newsgroups: comp.compilers
Date: 22 Aug 1998 23:35:48 -0400
Organization: Kithrup Enterprises, Ltd.
References: 98-08-147
Keywords: linker, question

John R Levine <johnl@iecc.com> wrote:
>I'm trying to figure out how linkers deal with the unique problems of
>C++.


Ah, fun stuff! I've done quite a bit of work on g++, so I'll give you
more of a brief summary/survey on what we've done with g++ to date.


>-- Templates and extern inline. This I understand the least. The
>problem is that with separate compilation, multiple modules can
>contain identical (or at least equivalent) copies of expanded
>templated routines and extern inlines. One approach is to pretend
>they're all static


Yes, people do this (g++ does this on system's that can't otherwise
cope). This is a last choice implementation strategy, or first
choice, first implementation strategy. But, the 1980s are over, time
to move on to better things.


>But some systems actually identify and discard the duplicates.


Yup, we (g++) do that too.


>What do they do [...]?


In g++, we have three approaches. The first is static duplicates,
with wrong semantics for static data in extern inline functions (you
get duplicate data, which violates the standard). For extern inlines
we have a #pragma (#pragma interface <opt file name>) for tagging a
header, and another (#pragma implementation <opt file name>) for
tagging a single .cc files that corresponds with that header if the
user wants to eliminate some of the duplicate stuff. The compiler is
then free to suppress duplicates by putting one copy in the
translation unit that has the implementation of items that have
duplicates that are from the corresponding interface headerfile. This
works for vtables and debugging information as well.


Now, in preference to that trick, we also put the implementation in
the file that contains the first out of body virtual function, as the
C++ languages mandates that you define it someplace, if such a
function exists.


Now, the second trick is to have a database that you use to record
places you can put out things that would otherwise be duplicates, and
an indication if they are written out in that translation unit. There
is a prelinker (collect2 in our case) that manages instantiations and
so forth. It works by having the compiler default to not generating
anything that can be a duplicate, but the compiler records the fact
that it can generate something. It does this only for things that it
not only can generate, but that it also needs (or might need). The
element is the mangled name of the entity. Also in the database is
how to compile/recompile the source. When linking, we do a trial full
link and collect all the linker error messages about undefined
symbols. We then go through them and any that we find in any of the
per .o databases, we change the entry that instructs the compiler _to_
generate the symbol as external. We do this for all undefined
symbols. After this, we run the linker again, and again, and again
until nothing changes (or we hit a 17 (overridable on the command
line) relinks), and then we do a normal link and let the user see the
error messages (if any). This model is used when the -frepo option is
used. It has some drawbacks with library building (if you're careful,
it can be made to work in that situation, but it does take some care
and thought).


The third trick is real live linker support. We call this model the
Borland model (as that was the first compiler that we were aware of
that did it, or at least the most popular one at the time). We
generate external duplicates in all modules that need them, but they
are marked as `link once' sections. The linker discards all but one
of them.


Here is what it looks like in a .s file:


.section .gnu.linkonce.t._._1B,"ax",@progbits
                .align 4
                .weak _._1B
                .type _._1B,@function
_._1B:
.LFB5:
                pushl %ebp


...


But this requires GNU ld and file formats like elf or pecoff (sorry,
no a.out). It is keyed off the name of the section, which is derived
from the symbol name, as you can see.




Now, for the last trick of the evening, its a bonus trick, as we don't
have it implemented yet for g++, though I think the linker now has all
the smarts in it to pull it off.


In C++, in a good implementation, a diagnostic should be produced if
the template definitions differ between translation units. The
standard defines exactly (maybe plus or minus a bug or two) when two
match and when they don't. This is accomplished by having another
section name derived from the symbol name as above, and in that
section put a nonloaded data value. This value is a hash value
(cryptographic checksum) computed from the source of the template.
And then the linker checks and ensures that all the sections with the
same name have the same value, otherwise it gives a diagnostic.
Presto, same hash, same source, no diagnostic; different hash,
different `source', diagnostic. And it doesn't matter if you use -O0
or -O9, things still work. Further, one can put abi compilation
options (what do you mean I can't link little endian code with big
endian code?) into such a hash value, and ensure that the linked
objects are otherwise compatible (if the file format doens't otherwise
help you with this).


Mike Stump
FSF G++ maintainer
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.