Re: How do linkers deal with C++ duplicate code?

Robert Bowdidge <bowdidge@watson.ibm.com>
24 Aug 1998 13:37:05 -0400

From comp.compilers

Related articles
[6 earlier articles]
Re: How do linkers deal with C++ duplicate code? stes@mundivia.es (David Stes) (1998-08-22)
Re: How do linkers deal with C++ duplicate code? saroj@bear.com (1998-08-22)
Re: How do linkers deal with C++ duplicate code? jacob@jacob.remcomp.fr (1998-08-22)
Re: How do linkers deal with C++ duplicate code? ian@cygnus.com (1998-08-22)
Re: How do linkers deal with C++ duplicate code? mrs@kithrup.com (1998-08-22)
Re: How do linkers deal with C++ duplicate code? mrs@kithrup.com (1998-08-22)
*Re: How do linkers deal with C++ duplicate code? bowdidge@watson.ibm.com (Robert Bowdidge)* (1998-08-24)**
Re: How do linkers deal with C++ duplicate code? joachim.durchholz@munich.netsurf.de (Joachim Durchholz) (1998-08-25)
Re: How do linkers deal with C++ duplicate code? dlmoore@molalla.net (David L Moore) (1998-08-30)
Re: How do linkers deal with C++ duplicate code? zalman@netcom.com (1998-08-31)

| List of all articles for this month |

From:	Robert Bowdidge <bowdidge@watson.ibm.com>
Newsgroups:	comp.compilers
Date:	24 Aug 1998 13:37:05 -0400
Organization:	IBM_Research
References:	98-08-147
Keywords:	linker

>> How do linkers deal with C++ duplicate code?

>With the limitations of most current C++ compilers, it is fairly easy
>to (accidentally or intentionally) write code that produces extremely
>bloated executables (another standard problem are duplicated debugging
>symbols). A few compilers (e.g., IBM's VisualAge 4.0, not available)
>seem to do better, but I don't know in detail how they do it. But
>they all abandon the "use the C linker plus a bunch of hacks"
>approach; VA 4.0 even abandons the traditional source file approach in
>favor of a program database.

Much of the problem with duplicate copies of a template function can
be traced to the standard compiler architecture where the source code
is given to the preprocessor, goes through the compiler into an object
file. The object files are then combined by a separate linking tool
that doesn't have the compiler's knowledge of the source. Urs
mentions one solution: have a particular phase of the compilation
generate extra information to be used by later phases, such as
producing a template database. By contrast, IBM's VisualAge C++ 4.0
(aka Montana) is implemented as a set of cooperating tools sharing a
set of data structures, and thus the linker can directly look at the
compiler's template representations.

Montana is an incremental compiler -- for each change, Montana
recompiles at a function body level only those parts of the program
directly affected by the change. In order to accomplish this, it
needs to have accurate information about global declarations and how
they rely on attributes of other global declarations. Information
about templates -- instantiated and uninstantiated -- is always around
for the compiler. This information is kept on a per-executable basis.
Montana also uses an incremental linker that's embedded as part of the
compiler. The linker thus isn't trying to combine a set of .o files,
but instead is trying to note what's changed or replaced, and modifying
the binary appropriately.

Multiple aspects of the Montana design avoid the multiple instantiation
problem. First, the compiler looks at each program as a whole, rather than
considering each compilation unit (file) separately. Second, information
from the compiler can be accessed by the linker, helping the linker ensure
it only instantiates each function only once. Finally, the incremental
linker forces the design away from the "concatenate objects together" model
of linking.

Apologies for the fuzzy description. Hopefully someone from the core
development team will realize how badly I savaged the description of
their careful design and give the real details, but hopefully this bit
of background will clarify Urs's comments.

Two corrections on Urs's article:

A major design point for Montana was to try to build an incremental
compiler without using the source code repository model that's been
attempted previously by Taligent and others. There are two strong
motivations for this choice. First, programmers tend to be fluent
with fast-and-easy lexical tools such as grep and awk. In addition,
repositories make it hard to use existing tools and utilities, and
force you to completely convert to the new environment. Montana keeps
the source code in the original files. When a change is made, it
performs a quick search on changed files to note the changed regions,
then reparses the regions.

Because programmers still have access to their source code, they won't
have to change their behavior or toolsets. Because the compiler now
has a program representation handy at all times, tools in the
environment can efficiently retrieve syntactic and semantic
information about the program. Parse trees are available, although
they're generated on demand instead of being kept available at all
times.

Finally, the VisualAge C++ 4.0 for AIX product is currently shipping.

More details on the Montana architecture will be available in a paper
to be published in the proceedings of the Foundations of Software
Engineering conference in November.

Robert
--

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: How do linkers deal with C++ duplicate code?

Robert Bowdidge <bowdidge@watson.ibm.com>24 Aug 1998 13:37:05 -0400

Robert Bowdidge <bowdidge@watson.ibm.com>
24 Aug 1998 13:37:05 -0400