Related articles |
---|
Re: Current work in compiler/language design. sverker@sics.se (1991-11-19) |
_My_ ideas for current research! eifrig@blaze.cs.jhu.edu (1991-11-22) |
Re: _My_ ideas for current research! carlton@scws8.harvard.edu (1991-11-23) |
Re: _My_ ideas for current research! hoelzle@Neon.Stanford.EDU (1991-11-26) |
Newsgroups: | comp.compilers |
From: | hoelzle@Neon.Stanford.EDU (Urs Hoelzle) |
Keywords: | Lisp, Scheme, performance |
Organization: | CS Department, Stanford University, California, USA |
References: | 91-11-072 91-11-108 |
Date: | 26 Nov 91 01:00:12 GMT |
carlton@scws8.harvard.edu (david carlton) writes:
>In article 91-11-099, eifrig@blaze.cs.jhu.edu (Jonathan Eifrig) writes:
>Andrew Appel has proven that heap allocation is faster than stack
>allocation under fairly reasonable circumstances, assuming that you have
>enough memory. (The basic idea is that if you allocate things on the
>stack then you have to deallocate dead objects explicitly, whereas if you
>allocate things in the heap and use stop-and-copy garbage collection, you
>only touch live objects, not the dead ones; so if your typical dead
>object/live object ratio is high enough, you are better off allocating
>everything in the heap.) But his and David MacQueen's SML/NJ compiler is
>the only one that I know of that does this.
Whoa! I'm a big fan of (compacting generational) GC, but Appel's paper
doesn't really prove *anything* except that for one particular machine /
program combination, heap allocation was faster. The paper is nice but
has two fatal flaws (IMHO) which prevent generalizations:
1) "if you allocate things on the stack then you have to deallocate
dead objects explicitly". This is simply not true in general. Any
reasonable compiler will discard the stack-allocated data together
with the rest of the stack frame, i.e. using 0 cycles.
2) Architectural effects are completely ignored in the analysis. The
machine used for the timings was a (slow) VAX, so cache effects
played no role. In today's RISC systems which have a HUGE speed
difference between CPU and main memory (dozens of cycles for a
cache miss), you can't just assume that a store is a store is a
store.
With stack allocation, the top of the stack is likely to be in the
cache, and thus writes and reads to the data are likely to be hits.
With a very large heap (such as those used by Appel), every initial
store to a new object will cause a cache miss since the heap is
much bigger than the processor cache.
In summary, one can make a good argument a generational copying GC can be
made reasonably efficient by increasing the size of the creation space.
However, heap allocation is not free, and allocating procedure activations
on the heap is almost always a VERY bad idea if you want your system to
run reasonably fast (within a small factor of optimized C).
Some back-of-the-envelope calculations to support this: assume a Sun SS-1,
write miss = 26 cy/line, 1 line = 4 words. 500K calls/s (one call every
50 instructions) and avg. activation record size of 8 words --> 5e5 * (8 /
4) = 1e6 misses / s = 26 M cycles/s lost to cache misses. Since a SS-1
runs at 25MHz, you lose more cycles per second than you have available,
i.e. your program runs at most half as fast as the optimized C program.
[The 500K calls/s figure is taken from several C benchmarks, i.e. I
believe it is not unreasonably high...also, if I remember correctly, many
studies show calls to be around 2% of the dynamically-executed
instructions.]
Of course, if your system runs 10 times slower tha C anyway, you're doing
only 50K calls/s and the relative overhead drops to 10% which is fine
considering the simplification that heap-allocation probably brings. But
don't argue that heap allocation makes it go faster...
-Urs
----------------------------------------------------------------------------
Urs Hoelzle hoelzle@cs.stanford.EDU
Computer Systems Laboratory, CIS 57, Stanford University, Stanford, CA 94305
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.