Re: Register allocation

kym@kymhorsell.com
11 Aug 2004 12:57:00 -0400

          From comp.compilers

Related articles
[10 earlier articles]
Re: Register allocation kamalp@acm.org (2004-08-05)
Re: Register allocation kym@sdf.lonestar.org (russell kym horsell) (2004-08-09)
Re: Register allocation kamalp@acm.org (2004-08-09)
Re: Register allocation gopi@sankhya.com (2004-08-10)
Re: Register allocation anton@mips.complang.tuwien.ac.at (2004-08-10)
Re: Register allocation anton@mips.complang.tuwien.ac.at (2004-08-10)
Re: Register allocation kym@kymhorsell.com (2004-08-11)
Re: Register allocation kamalp@acm.org (2004-08-13)
Register allocation thibault.langlois@di.fc.ul.pt (thibault.langlois@di.fc.ul.pt) (2005-05-13)
Re: Register allocation rgd00@doc.ic.ac.uk (Rob Dimond) (2005-05-16)
Re: Register allocation torbenm@diku.dk (2005-05-18)
Re: Register allocation thibault.langlois@di.fc.ul.pt (thibault.langlois@di.fc.ul.pt) (2005-05-20)
Re: Register allocation c3riechers@adelphia.com (Chuck Riechers) (2005-05-21)
[12 later articles]
| List of all articles for this month |
From: kym@kymhorsell.com
Newsgroups: comp.compilers
Date: 11 Aug 2004 12:57:00 -0400
Organization: Central Iowa (Model) Railroad, Plano, TX, USA
References: 04-08-050
Keywords: registers
Posted-Date: 11 Aug 2004 12:57:00 EDT

> russell kym horsell <kym@sdf.lonestar.org> writes:
>>Kamal R. Prasad <kamalp@acm.org> wrote:
>>[...]
>>> The overhead is 1-load and 1-store, but the overhead isn't as high as
>>> you would expect, thanks to a hierarchy of caches to speed things up.
>>> No doubt the speed with which a register can be accessed is much
>>[...]
>>Measure it, and you'll get a shock. On 5 yo architecture there seems
>>to be virtually no diff between memory access and registers for common
>>stuff.
> It depends very much on the microarchitecture. On most of them
> register allocation pays off very well. For some older results of a
[...]


I also have some benchmarks for P4s, XP's and some older things using
using SSE1, SSE2, 3Dnow and the old x87 FP units (you'll note
a predelection toward fp work), with 5 different instruction scheduling
stategies, with or without bb-wide register variables. The diff in
speed for well-scheduled code of loop-wide reg vars seems to be only a few
percent.


A much larger impact than whether to put something in a reg is getting rid
of near machine-wide pipe bubbles or stalls by allocating
the right instr packet for the right time-slot, and improving cache and
TLB hits (and even victim cache hits, where applicable) with appropriate
unrolling and blocking.


GCC is a great gen purp re-targetable compiler, but is not really
a high-performance platform.


It seems even basic vectorising/pipeline/vliw compilers generate code
that can run an order of magnitude faster than gcc's code for many
modern desktops.


I have some benchmarks where a a basic home grown F90 compiler generates
unassisted code running 50x than even moderately pre- tinkered C code
using gcc -fwhatever -mwhatever -O[0-8] on the same platform.


Some PD "tarpit" results are available somewhere under
http://junk.kymhorsell.com


----------------------------------------
kym@kym.massbus.org
SDF Public Access UNIX System - http://sdf.lonestar.org


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.