|Instruction scheduling with gcc on alpha firstname.lastname@example.org (1997-05-13)|
|Re: Instruction scheduling with gcc on alpha email@example.com (John Haxby) (1997-05-22)|
|Re: Instruction scheduling with gcc on alpha Robert.Harley@inria.fr (1997-06-13)|
|Re: Instruction scheduling with gcc on alpha firstname.lastname@example.org (Toon Moene) (1997-06-24)|
|From:||email@example.com (Claus Denk)|
|Date:||13 May 1997 22:43:24 -0400|
|Organization:||Centro Informatico Cientifico de Andalucia|
I have been posting this to gcc.help, but no answer so far. Maybe this
is the right group for this question ?
I am just looking at the machine code created by gcc. I am interested
in simple floating vector operations, as for example:
for (i = 0; i< n; i++)
dy[i] = da*dx[i];
For pipelined architectures like the alpha, loop unrolling is
essential. Now, the loop is unrolled like this:
and so on ..
In order to be able to schedule this code, different floating registers
should be used for each multiplication, i.e
ldt $f3 ....
Only in this case we may reorder the instructions in order to be able
to execute it in a "parallel" manner:
This should be quite important for almost all superscalar architectures.
At least on my alpha (21164) it speeds up things quite a bit (Factor 2).
Anyone knows how far gcc is with those things ?
Thanx for any answers !
Return to the
Search the comp.compilers archives again.