Instruction scheduling with gcc on alpha

denk@obelix.cica.es (Claus Denk)
13 May 1997 22:43:24 -0400

          From comp.compilers

Related articles
Instruction scheduling with gcc on alpha denk@obelix.cica.es (1997-05-13)
Re: Instruction scheduling with gcc on alpha jch@hazel.pwd.hp.com (John Haxby) (1997-05-22)
Re: Instruction scheduling with gcc on alpha Robert.Harley@inria.fr (1997-06-13)
Re: Instruction scheduling with gcc on alpha toon@moene.indiv.nluug.nl (Toon Moene) (1997-06-24)
| List of all articles for this month |
From: denk@obelix.cica.es (Claus Denk)
Newsgroups: comp.compilers
Date: 13 May 1997 22:43:24 -0400
Organization: Centro Informatico Cientifico de Andalucia
Keywords: optimize, architecture

I have been posting this to gcc.help, but no answer so far. Maybe this
is the right group for this question ?


I am just looking at the machine code created by gcc. I am interested
in simple floating vector operations, as for example:


    for (i = 0; i< n; i++)
          dy[i] = da*dx[i];


For pipelined architectures like the alpha, loop unrolling is
essential. Now, the loop is unrolled like this:


    ldt $f1,0($18)
    mult $f17,$f1,$f1
    stt $f1,0($19)
    and so on ..


In order to be able to schedule this code, different floating registers
should be used for each multiplication, i.e


    ldt $f1,0($18)
    mult $f17,$f1,$f1
    stt $f1,0($19)
    ldt $f2,8($18)
    mult $f17,$f2,$f2
    stt $f2,8($19)
    ldt $f3 ....


Only in this case we may reorder the instructions in order to be able
to execute it in a "parallel" manner:


    ldt $f1,0($18)
    ldt $f2,8($18)
    mult $f17,$f1,$f1
    mult $f17,$f2,$f2
    stt $f1,0($19)
    stt $f2,8($19)


This should be quite important for almost all superscalar architectures.
At least on my alpha (21164) it speeds up things quite a bit (Factor 2).


Anyone knows how far gcc is with those things ?


Thanx for any answers !
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.