|When to do inline expansion jhall@whale.WPI.EDU (1993-09-14)|
|Re: When to do inline expansion email@example.com (1993-09-20)|
|Re: When to do inline expansion firstname.lastname@example.org (1993-09-20)|
|Re: When to do inline expansion email@example.com (1993-09-20)|
|Re: When to do inline expansion firstname.lastname@example.org (1993-09-21)|
|Re: When to do inline expansion email@example.com (1993-09-21)|
|Re: When to do inline expansion firstname.lastname@example.org (1993-09-21)|
|Re: When to do inline expansion email@example.com (1993-09-22)|
|Re: When to do inline expansion firstname.lastname@example.org (1993-09-22)|
|Re: When to do inline expansion email@example.com (1993-09-22)|
|[2 later articles]|
|From:||firstname.lastname@example.org (David Moore)|
|Date:||Mon, 20 Sep 1993 23:21:25 GMT|
email@example.com (Zalman Stern) writes:
[ re: question about when to do inlining]
>If it is called precisely once in the program, the function should be
>inlined no matter what.
This is not quite true. There are pathological cases.
On certain Risc machines (eg Sparc, AM29000), when you call a routine, you
get a new set of registers by performing a register window push. This push
is often almost free [2-3 instruction slots] since a limited number of
pushes will be done entirely in hardware registers.
If inlining the routine increases register pressure to the point where you
have to start spilling registers, leaving the routine out-of-line may run
faster. [I wonder if anyone has ever investigated spilling using register
window pushes in line]
There is another way in which inlining of a routine can cause a program to
run slower. This is also architecture dependent. If you can fit the
out-of-line routine and the loop using it in a single page, but inlining
results in the routine spanning a page boundary, then, if you are running
in a real (rather than cached) memory system, you can end up taking two
Row Address refreshes per loop execution whereas the out-of-line version
would require none. This could cost you around 10 instruction slots,
whereas the cost of a function call/return can be as few as 2 slots - with
floating point operations - it can even be free!
Return to the
Search the comp.compilers archives again.