Re: the Evil Effects of Inlining

compres! (Chris Clark)
Sat, 4 May 91 00:54:22 EDT

          From comp.compilers

Related articles
the Evil Effects of Inlining (1991-05-02)
Re: the Evil Effects of Inlining (1991-05-03)
Re: the Evil Effects of Inlining daniel@quilty.Stanford.EDU (1991-05-03)
Re: the Evil Effects of Inlining (1991-05-03)
Re: the Evil Effects of Inlining (1991-05-03)
Re: the Evil Effects of Inlining (1991-05-03)
Re: the Evil Effects of Inlining (1991-05-03)
Re: the Evil Effects of Inlining compres! (1991-05-04)
Re: the Evil Effects of Inlining (1991-05-05)
Re: the Evil Effects of Inlining (1991-05-05)
Re: the Evil Effects of Inlining (Eric A. Anderson) (1991-05-06)
Re: the Evil Effects of Inlining (1991-05-06)
| List of all articles for this month |

Newsgroups: comp.compilers
From: compres! (Chris Clark)
In-Reply-To:'s message of 3 May 91 17:14:49 GMT
Keywords: optimize, inlining, FORTRAN
Organization: Compilers Central
References: <boehm.673290889@siria>
Date: Sat, 4 May 91 00:54:22 EDT

Although, I generally agree with Hans' comments. I do want to make a minor
correction to his first statement. I know longer have the precise data on
it. However, the assumption that old style FORTRAN programs do not profit
from inlining is incorrect. I was part of a team that did inlining as
part of optimization enhancements for FORTRAN, PL/I, and related
compilers. Our results suggested that inlining paid off more often than
expected (but possibly not by as high a percentage) even when inlining
large procedures. I also believe that you'll find that the MIPS people do
fairly extensive inlining as part of their optimizations and they are
targetting "traditional" languages. (They had (have?) something called
pmerge or umerge which does it. They may even wait until link time to do
it now, to "solve" the separate compilation problem.)

1 PARAGRAPH DIGRESSION: The basic problem is that I think we are not as
smart about when and what to optimize by hand as we think. Also many
applications are written primarily for correctness, portability, or
maintainability in any case. Though, I will admit a lot of FORTRAN is
well tuned, since computers used to be so slow!

Truly large procedures do not generally cause the program of code
expansion, because they are quite often only called from one site. One
hard part of the trade off is the dynamic/static balance. It's almost
univerally a win to inline a procedure if it has only one call site. To
avoid the locality of reference problem that may occur if the procedure is
conditionally called, put the inlined procedure at the top or bottom of
the code and jump to and from it. (One of the jumps can be the condition
which triggers the call.) You may have to tweak other parts of your
optimizer to avoid it re-linearizing the flow of control and rearranging
the new code back into the middle. Statistically, having the code in a
different part of the same module should not have a higher probability of
a cache conflict than having the code in a seperate module. However, in
specific instances that will not be the case. (And in benchmarking,
specific instances are all that count. Unfortunately, I think that's true
in general--every execution is always a specific instance.) Thus, user
control is important, for those cases when the user is actually smart
about exactly what they want done--i.e. they ran the profiler and analyzed
the results.

The hard analysis comes with the "wide part of the onion", the middle
layer of abstraction. Here the functions tend to be called from several
sites and be of moderate length. This is often the meat of the
application and performs those parts which are specific to the task at
hand, as opposed to the top general driver code and bottom standard canned
primitives like add to a list. (I think I just recently read that this is
where 80% of the coding errors occur also.) Anyway, here it is usually
only profitable to inline if the call occurs within the context of a loop,
and even then with some trepidation.


However, although it was profitable, I think we were shooting for an
average 10-15% gain on total execution time, including all of the
optimizations we did at that rev. I belive strength reduction and
improved register analysis improved the performance more than inlining did
in many applications. I also don't remember whether we did the recursion
unrolling inline substitution or not. I do know we did special code to
help us with up-level variable references. The code was also controlled
by command line parameters to give users some knobs to tweak and hopefully
prevent explosive growth, which would have been fatal since the
architecture was segmented and there was a fixed upper bound on the size
of any one procedure.

It is possible that our results were colored by the truly expensive
procedure call overhead or size of the Icache on the target machines. A
typical CPU was 3 to 5 boards with about half a board dedicated to cache
if I recall. The results were also colored by the fact that the compiler
did no interprocedural analysis at that rev.

The nature of the Icache may have also affected the results. I believe
the probability of a cache miss on two addresses within a segment
conflicting were lower (possibly zero on the high end machine) than
conflicts between separate segments, and thus between separate procedures.
(Multiple procedures could exist in a segment, but the linker always
packed segments from the same end.)

All of this is only my recollections. The true data, which was gathered
on "real live applications" has probably been lost for years and was
probably marked as proprietary anyway. I also didn't do the
implementation. I think Rich Ford (now at Encore) or Dan Laska (Compass?)
did, but it also may have been Karen Berman-Mulligan (Private), Ruth
Halpern (LTX), or Suresh Rao (Intel).

I hope this helps,
- Chris (Clark)

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.