Related articles |
---|
[10 earlier articles] |
Re: Register allocation kamalp@acm.org (2004-08-05) |
Re: Register allocation kym@sdf.lonestar.org (russell kym horsell) (2004-08-09) |
Re: Register allocation kamalp@acm.org (2004-08-09) |
Re: Register allocation gopi@sankhya.com (2004-08-10) |
Re: Register allocation anton@mips.complang.tuwien.ac.at (2004-08-10) |
Re: Register allocation anton@mips.complang.tuwien.ac.at (2004-08-10) |
Re: Register allocation kym@kymhorsell.com (2004-08-11) |
Re: Register allocation kamalp@acm.org (2004-08-13) |
Register allocation thibault.langlois@di.fc.ul.pt (thibault.langlois@di.fc.ul.pt) (2005-05-13) |
Re: Register allocation rgd00@doc.ic.ac.uk (Rob Dimond) (2005-05-16) |
Re: Register allocation torbenm@diku.dk (2005-05-18) |
Re: Register allocation thibault.langlois@di.fc.ul.pt (thibault.langlois@di.fc.ul.pt) (2005-05-20) |
Re: Register allocation c3riechers@adelphia.com (Chuck Riechers) (2005-05-21) |
[12 later articles] |
From: | kym@kymhorsell.com |
Newsgroups: | comp.compilers |
Date: | 11 Aug 2004 12:57:00 -0400 |
Organization: | Central Iowa (Model) Railroad, Plano, TX, USA |
References: | 04-08-050 |
Keywords: | registers |
Posted-Date: | 11 Aug 2004 12:57:00 EDT |
> russell kym horsell <kym@sdf.lonestar.org> writes:
>>Kamal R. Prasad <kamalp@acm.org> wrote:
>>[...]
>>> The overhead is 1-load and 1-store, but the overhead isn't as high as
>>> you would expect, thanks to a hierarchy of caches to speed things up.
>>> No doubt the speed with which a register can be accessed is much
>>[...]
>>Measure it, and you'll get a shock. On 5 yo architecture there seems
>>to be virtually no diff between memory access and registers for common
>>stuff.
> It depends very much on the microarchitecture. On most of them
> register allocation pays off very well. For some older results of a
[...]
I also have some benchmarks for P4s, XP's and some older things using
using SSE1, SSE2, 3Dnow and the old x87 FP units (you'll note
a predelection toward fp work), with 5 different instruction scheduling
stategies, with or without bb-wide register variables. The diff in
speed for well-scheduled code of loop-wide reg vars seems to be only a few
percent.
A much larger impact than whether to put something in a reg is getting rid
of near machine-wide pipe bubbles or stalls by allocating
the right instr packet for the right time-slot, and improving cache and
TLB hits (and even victim cache hits, where applicable) with appropriate
unrolling and blocking.
GCC is a great gen purp re-targetable compiler, but is not really
a high-performance platform.
It seems even basic vectorising/pipeline/vliw compilers generate code
that can run an order of magnitude faster than gcc's code for many
modern desktops.
I have some benchmarks where a a basic home grown F90 compiler generates
unassisted code running 50x than even moderately pre- tinkered C code
using gcc -fwhatever -mwhatever -O[0-8] on the same platform.
Some PD "tarpit" results are available somewhere under
http://junk.kymhorsell.com
----------------------------------------
kym@kym.massbus.org
SDF Public Access UNIX System - http://sdf.lonestar.org
Return to the
comp.compilers page.
Search the
comp.compilers archives again.