|Re: Q: P6 branch prediction krste@ICSI.Berkeley.EDU (1996-04-29)|
|Re: Q: P6 branch prediction email@example.com.OZ.AU (1996-05-01)|
|Re: Q: P6 branch prediction firstname.lastname@example.org (1996-05-14)|
|Re: Using memory below the SP (Was: Q: P6 branch prediction) email@example.com (1996-05-18)|
|Re: Using memory above TOS firstname.lastname@example.org (1996-05-19)|
|Using memory above TOS email@example.com.OZ.AU (Fergus Henderson) (1996-05-21)|
|Re: Using memory below the SP (Was: Q: P6 branch prediction) firstname.lastname@example.org (Michael Meissner) (1996-05-24)|
|Re: Using memory above TOS email@example.com (1996-05-29)|
|From:||firstname.lastname@example.org.OZ.AU (Thomas Charles CONWAY)|
|Date:||1 May 1996 23:02:35 -0400|
|References:||<3179B05D.email@example.com> <firstname.lastname@example.org.OZ.AU> <email@example.com.OZ.AU>|
krste@ICSI.Berkeley.EDU (Krste Asanovic) writes:
>A related often-missed optimization is delaying the build of a stack
>frame until it is certain it is needed. [...]
>Can any current compilers do this optimization?
firstname.lastname@example.org.OZ.AU (Fergus Henderson) writes:
>The Mercury compiler does.
Actually, what the Mercury compiler does is somewhat more complicated.
When we first put in the optimization to delay the construction of
stack frames we got a slowdown on one of our benchmark programs. The
problem was that the test at the start of the procedure (which is
common to many Mercury procedures) was a simple conditional branch (a
test against 0). Since it was the first thing in the procedure, the
assembler couldn't find anything to put in the delay slot of the
conditional branch, so it wasted a cycle. When this situation occurs
in an inner loop, it makes a significant difference.
The construction of the stack frame in most Mercury procedures
consists of incrementing the stack pointer, and storing the return
continuation in the deepest stack slot.
To avoid wasting the cycle, what the Mercury compiler now does is emit
code to save the return continuation one slot above the stack top,
then the conditional branch and then, in the appropriate arm of the
code, the stack pointer increment. Because the store of the return
continuation is above the stack top, it doesn't clobber anything
live. In the arm of the code that uses the stack frame, that stack
slot becomes the deepest slot of the frame after the stack pointer has
been incremented, and hence the code has the correct behaviour, and
the assembler has something to put in the delay slot.
We eventually put in a compiler flag to move the whole stackframe
setup instead of just part of it, since not all architectures have
delay slots. :-)
Return to the
Search the comp.compilers archives again.