Re: Q: P6 branch prediction (Thomas Charles CONWAY)
1 May 1996 23:02:35 -0400

          From comp.compilers

Related articles
Re: Q: P6 branch prediction krste@ICSI.Berkeley.EDU (1996-04-29)
Re: Q: P6 branch prediction (1996-05-01)
Re: Q: P6 branch prediction (1996-05-14)
Re: Using memory below the SP (Was: Q: P6 branch prediction) (1996-05-18)
Re: Using memory above TOS (1996-05-19)
Using memory above TOS (Fergus Henderson) (1996-05-21)
Re: Using memory below the SP (Was: Q: P6 branch prediction) (Michael Meissner) (1996-05-24)
Re: Using memory above TOS (1996-05-29)
| List of all articles for this month |

From: (Thomas Charles CONWAY)
Newsgroups: comp.arch,comp.compilers
Date: 1 May 1996 23:02:35 -0400
Organization: Compilers Central
References: <> <4m5a3u$> <4m6ufs$>
Keywords: architecture

krste@ICSI.Berkeley.EDU (Krste Asanovic) writes:
>A related often-missed optimization is delaying the build of a stack
>frame until it is certain it is needed. [...]
>Can any current compilers do this optimization? (Fergus Henderson) writes:
>The Mercury compiler does.

Actually, what the Mercury compiler does is somewhat more complicated.

When we first put in the optimization to delay the construction of
stack frames we got a slowdown on one of our benchmark programs. The
problem was that the test at the start of the procedure (which is
common to many Mercury procedures) was a simple conditional branch (a
test against 0). Since it was the first thing in the procedure, the
assembler couldn't find anything to put in the delay slot of the
conditional branch, so it wasted a cycle. When this situation occurs
in an inner loop, it makes a significant difference.

The construction of the stack frame in most Mercury procedures
consists of incrementing the stack pointer, and storing the return
continuation in the deepest stack slot.

To avoid wasting the cycle, what the Mercury compiler now does is emit
code to save the return continuation one slot above the stack top,
then the conditional branch and then, in the appropriate arm of the
code, the stack pointer increment. Because the store of the return
continuation is above the stack top, it doesn't clobber anything
live. In the arm of the code that uses the stack frame, that stack
slot becomes the deepest slot of the frame after the stack pointer has
been incremented, and hence the code has the correct behaviour, and
the assembler has something to put in the delay slot.

We eventually put in a compiler flag to move the whole stackframe
setup instead of just part of it, since not all architectures have
delay slots. :-)


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.