Re: Register allocation
11 Aug 2004 12:57:00 -0400

          From comp.compilers

Related articles
[10 earlier articles]
Re: Register allocation (2004-08-05)
Re: Register allocation (russell kym horsell) (2004-08-09)
Re: Register allocation (2004-08-09)
Re: Register allocation (2004-08-10)
Re: Register allocation (2004-08-10)
Re: Register allocation (2004-08-10)
Re: Register allocation (2004-08-11)
Re: Register allocation (2004-08-13)
Register allocation ( (2005-05-13)
Re: Register allocation (Rob Dimond) (2005-05-16)
Re: Register allocation (2005-05-18)
Re: Register allocation ( (2005-05-20)
Re: Register allocation (Chuck Riechers) (2005-05-21)
[12 later articles]
| List of all articles for this month |

Newsgroups: comp.compilers
Date: 11 Aug 2004 12:57:00 -0400
Organization: Central Iowa (Model) Railroad, Plano, TX, USA
References: 04-08-050
Keywords: registers
Posted-Date: 11 Aug 2004 12:57:00 EDT

> russell kym horsell <> writes:
>>Kamal R. Prasad <> wrote:
>>> The overhead is 1-load and 1-store, but the overhead isn't as high as
>>> you would expect, thanks to a hierarchy of caches to speed things up.
>>> No doubt the speed with which a register can be accessed is much
>>Measure it, and you'll get a shock. On 5 yo architecture there seems
>>to be virtually no diff between memory access and registers for common
> It depends very much on the microarchitecture. On most of them
> register allocation pays off very well. For some older results of a

I also have some benchmarks for P4s, XP's and some older things using
using SSE1, SSE2, 3Dnow and the old x87 FP units (you'll note
a predelection toward fp work), with 5 different instruction scheduling
stategies, with or without bb-wide register variables. The diff in
speed for well-scheduled code of loop-wide reg vars seems to be only a few

A much larger impact than whether to put something in a reg is getting rid
of near machine-wide pipe bubbles or stalls by allocating
the right instr packet for the right time-slot, and improving cache and
TLB hits (and even victim cache hits, where applicable) with appropriate
unrolling and blocking.

GCC is a great gen purp re-targetable compiler, but is not really
a high-performance platform.

It seems even basic vectorising/pipeline/vliw compilers generate code
that can run an order of magnitude faster than gcc's code for many
modern desktops.

I have some benchmarks where a a basic home grown F90 compiler generates
unassisted code running 50x than even moderately pre- tinkered C code
using gcc -fwhatever -mwhatever -O[0-8] on the same platform.

Some PD "tarpit" results are available somewhere under

SDF Public Access UNIX System -

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.