Re: When to do inline expansion

davidm@questor.rational.com (David Moore)
Mon, 20 Sep 1993 23:21:25 GMT

          From comp.compilers

Related articles
When to do inline expansion jhall@whale.WPI.EDU (1993-09-14)
Re: When to do inline expansion zstern@adobe.com (1993-09-20)
Re: When to do inline expansion salomon@silver.cs.umanitoba.ca (1993-09-20)
Re: When to do inline expansion davidm@questor.rational.com (1993-09-20)
Re: When to do inline expansion jfc@athena.mit.edu (1993-09-21)
Re: When to do inline expansion jgmorris+@cs.cmu.edu (1993-09-21)
Re: When to do inline expansion jdean@bergen.cs.washington.edu (1993-09-21)
Re: When to do inline expansion salomon@silver.cs.umanitoba.ca (1993-09-22)
Re: When to do inline expansion preston@dawn.cs.rice.edu (1993-09-22)
Re: When to do inline expansion cliffc@rice.edu (1993-09-22)
[2 later articles]
| List of all articles for this month |
Newsgroups: comp.compilers
From: davidm@questor.rational.com (David Moore)
Keywords: optimize, registers
Organization: Rational
References: 93-09-063 93-09-069
Date: Mon, 20 Sep 1993 23:21:25 GMT

zstern@adobe.com (Zalman Stern) writes:


[ re: question about when to do inlining]
>If it is called precisely once in the program, the function should be
>inlined no matter what.


This is not quite true. There are pathological cases.


On certain Risc machines (eg Sparc, AM29000), when you call a routine, you
get a new set of registers by performing a register window push. This push
is often almost free [2-3 instruction slots] since a limited number of
pushes will be done entirely in hardware registers.


If inlining the routine increases register pressure to the point where you
have to start spilling registers, leaving the routine out-of-line may run
faster. [I wonder if anyone has ever investigated spilling using register
window pushes in line]


There is another way in which inlining of a routine can cause a program to
run slower. This is also architecture dependent. If you can fit the
out-of-line routine and the loop using it in a single page, but inlining
results in the routine spanning a page boundary, then, if you are running
in a real (rather than cached) memory system, you can end up taking two
Row Address refreshes per loop execution whereas the out-of-line version
would require none. This could cost you around 10 instruction slots,
whereas the cost of a function call/return can be as few as 2 slots - with
floating point operations - it can even be free!








--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.