Related articles |
---|
Use the stack less :) chriswalton87@hotmail.com (2004-04-15) |
Re: Use the stack less :) dnovillo@redhat.com (Diego Novillo) (2004-04-15) |
Re: Use the stack less :) wyrmwif@tsoft.com (SM Ryan) (2004-04-15) |
Re: Use the stack less :) TommyAtNumba-Tu.Com--not@yahoo.com (Tommy Thorn) (2004-04-21) |
Re: Use the stack less :) dSpam@arcor.de (Dietmar Schindler) (2004-04-21) |
From: | chriswalton87@hotmail.com (Ark?) |
Newsgroups: | comp.compilers |
Date: | 15 Apr 2004 12:30:42 -0400 |
Organization: | http://groups.google.com |
Keywords: | storage, performance, design, question |
Posted-Date: | 15 Apr 2004 12:30:42 EDT |
Hi.
You might have gotten things about this before, but I haven't found
anything yet, and this has really been baffling me.
On Intel processors (and on some others, I'm sure), why in heaven's
name would you use the hardware stack for return addresses, arguments,
and locals? It makes no sense to me. Maybe it is because I'm a Forth
programmer, but I believe the return stack and the data stack (or
whatever is acting as it) should be completely separate entities. Why
not allocate a separate data stack from the heap at initialization
instead, incurring an initial cost but making it up in the long run?
Let's take this function:
void blah(int a, char b)
{
int c;
a++; b++; c++
}
IT doesn't do much, but it serves our purpose well. Regular GCC will
compile this as:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
movl 12(%ebp), %eax
movb %al, -1(%ebp)
incl 8(%ebp)
leal -1(%ebp), %eax
incb (%eax)
leal -8(%ebp), %eax
incl (%eax)
leave
ret
It spends 3 instructions (all of which stall the pipeline) to
initialize, and 2 (very slow) instructions to de-initialize.
-fomit-frame-pointer does marginally better:
subl $8, %esp
movl 16(%esp), %eax
movb %al, 7(%esp)
incl 12(%esp)
leal 7(%esp), %eax
incb (%eax)
movl %esp, %eax
incl (%eax)
addl $8, %esp
ret
This is much better, and comes close to the efficiency of a separate,
downward stack (held in EBP, possibly). However, the separate stack
approach has the upper hand if the most recent local gets accessed
alot - no offsets make the instruction both smaller and faster :)
Another thing is that it should be much easier for debuggers to trace
things, right? They know exactly the return stack, and fighuring out
the locals shouldn't be hard at all. So, why don't compilers do it
this way?
Whee, this turned more into a rant than a question :) Thanks for any
answers
-- Chris
Return to the
comp.compilers page.
Search the
comp.compilers archives again.