Related articles |
---|
Instruction scheduling with gcc on alpha denk@obelix.cica.es (1997-05-13) |
Re: Instruction scheduling with gcc on alpha jch@hazel.pwd.hp.com (John Haxby) (1997-05-22) |
Re: Instruction scheduling with gcc on alpha Robert.Harley@inria.fr (1997-06-13) |
Re: Instruction scheduling with gcc on alpha toon@moene.indiv.nluug.nl (Toon Moene) (1997-06-24) |
From: | denk@obelix.cica.es (Claus Denk) |
Newsgroups: | comp.compilers |
Date: | 13 May 1997 22:43:24 -0400 |
Organization: | Centro Informatico Cientifico de Andalucia |
Keywords: | optimize, architecture |
I have been posting this to gcc.help, but no answer so far. Maybe this
is the right group for this question ?
I am just looking at the machine code created by gcc. I am interested
in simple floating vector operations, as for example:
for (i = 0; i< n; i++)
dy[i] = da*dx[i];
For pipelined architectures like the alpha, loop unrolling is
essential. Now, the loop is unrolled like this:
ldt $f1,0($18)
mult $f17,$f1,$f1
stt $f1,0($19)
and so on ..
In order to be able to schedule this code, different floating registers
should be used for each multiplication, i.e
ldt $f1,0($18)
mult $f17,$f1,$f1
stt $f1,0($19)
ldt $f2,8($18)
mult $f17,$f2,$f2
stt $f2,8($19)
ldt $f3 ....
Only in this case we may reorder the instructions in order to be able
to execute it in a "parallel" manner:
ldt $f1,0($18)
ldt $f2,8($18)
mult $f17,$f1,$f1
mult $f17,$f2,$f2
stt $f1,0($19)
stt $f2,8($19)
This should be quite important for almost all superscalar architectures.
At least on my alpha (21164) it speeds up things quite a bit (Factor 2).
Anyone knows how far gcc is with those things ?
Thanx for any answers !
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.