Load and store double (WAS: SPARC compiler optimisation)

pardo@cs.washington.edu (David Keppel)
Tue, 3 Mar 1992 15:51:14 GMT

          From comp.compilers

Related articles
Re: SPARC compiler optimisation andrew@highland.oz.au (1992-02-26)
Re: SPARC compiler optimisation preston@dawn.cs.rice.edu (1992-03-02)
Load and store double (WAS: SPARC compiler optimisation) pardo@cs.washington.edu (1992-03-03)
| List of all articles for this month |

Newsgroups: comp.compilers
From: pardo@cs.washington.edu (David Keppel)
Keywords: optimize, architecture
Organization: Computer Science & Engineering, U. of Washington, Seattle
References: 92-02-120 92-03-011
Date: Tue, 3 Mar 1992 15:51:14 GMT

>>[LDD and STD can load 8 bytes in 1 instruction, 1 delays slot, and 3
>> cycles where two loads takes 2 instructions, 2 delay slots, and 4
>> cycles.]

Preston Briggs (preston@dawn.cs.rice.edu) writes:
>[LDD & STD a mistake for RISC machines: compilers can't generate 'em.]

I expect if you have lots of fp values and ldd and std are used for
floating-point values then you don't need any more justification for
having ldd and std. Of course most RISC machines load and store fp regs
using different load and store instructions.

I balk at the assertion that a compiler cannot generate ldd/std on integer
register loads and stores. At the very least, the compiler can do testing
and code replication on promising-looking loops:

for (j=0; j<N; ++j) {
use (addr[j]);


if (addr & 0x7 != 0 || (N%2 != 0)) {
for (i=0; i<N*4; i+=4) {
ld [%addr + %i] => %t1
use (%t1);
} else {
for (i=0; i<N*4; i+=8) {
ldd [%addr + %i] => %t1, %t2
use (%t1);
use (%t2);

Is there any reason to generate them if you can? Several posters have
said yes, and I've also seen evidence that such optimizations can speed up
inner loops substantially (~10%) when `use' is small (even if you also
unroll the unaligned case; the speedup isn't just from unrolling).

The compiler can also use `ldd' and `std' to advantage when doing spills
and restores for both calle-save and caller-save registers (in general for
any spills and restores). Easiest is to do register allocation the normal
way, then do trivial register renaming to get the registers in adjacent
pairs. The stack spill slots need to be renamed and possibly realigned;
that ``should'' be straightforward. DECstation (MIPS) compilers already
do such an optimization some of the time. Putting spills and restores in
adjacent memory locations also improves caching.

;-D oN ( Registers for free: the NOP machine ) Pardo

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.