Use the stack less :)

chriswalton87@hotmail.com (Ark?)
15 Apr 2004 12:30:42 -0400

          From comp.compilers

Related articles
Use the stack less :) chriswalton87@hotmail.com (2004-04-15)
Re: Use the stack less :) dnovillo@redhat.com (Diego Novillo) (2004-04-15)
Re: Use the stack less :) wyrmwif@tsoft.com (SM Ryan) (2004-04-15)
Re: Use the stack less :) TommyAtNumba-Tu.Com--not@yahoo.com (Tommy Thorn) (2004-04-21)
Re: Use the stack less :) dSpam@arcor.de (Dietmar Schindler) (2004-04-21)
| List of all articles for this month |
From: chriswalton87@hotmail.com (Ark?)
Newsgroups: comp.compilers
Date: 15 Apr 2004 12:30:42 -0400
Organization: http://groups.google.com
Keywords: storage, performance, design, question
Posted-Date: 15 Apr 2004 12:30:42 EDT

Hi.


You might have gotten things about this before, but I haven't found
anything yet, and this has really been baffling me.


On Intel processors (and on some others, I'm sure), why in heaven's
name would you use the hardware stack for return addresses, arguments,
and locals? It makes no sense to me. Maybe it is because I'm a Forth
programmer, but I believe the return stack and the data stack (or
whatever is acting as it) should be completely separate entities. Why
not allocate a separate data stack from the heap at initialization
instead, incurring an initial cost but making it up in the long run?


Let's take this function:
void blah(int a, char b)
{
    int c;
    a++; b++; c++
}


IT doesn't do much, but it serves our purpose well. Regular GCC will
compile this as:
                pushl %ebp
                movl %esp, %ebp
                subl $8, %esp
                movl 12(%ebp), %eax
                movb %al, -1(%ebp)
                incl 8(%ebp)
                leal -1(%ebp), %eax
                incb (%eax)
                leal -8(%ebp), %eax
                incl (%eax)
                leave
                ret


It spends 3 instructions (all of which stall the pipeline) to
initialize, and 2 (very slow) instructions to de-initialize.
-fomit-frame-pointer does marginally better:
                subl $8, %esp
                movl 16(%esp), %eax
                movb %al, 7(%esp)
                incl 12(%esp)
                leal 7(%esp), %eax
                incb (%eax)
                movl %esp, %eax
                incl (%eax)
                addl $8, %esp
                ret




This is much better, and comes close to the efficiency of a separate,
downward stack (held in EBP, possibly). However, the separate stack
approach has the upper hand if the most recent local gets accessed
alot - no offsets make the instruction both smaller and faster :)


Another thing is that it should be much easier for debuggers to trace
things, right? They know exactly the return stack, and fighuring out
the locals shouldn't be hard at all. So, why don't compilers do it
this way?


Whee, this turned more into a rant than a question :) Thanks for any
answers
-- Chris


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.