Re: SPARC compiler optimisation

casper@fwi.uva.nl (Casper H.S. Dik)
Fri, 14 Feb 1992 09:42:38 GMT

          From comp.compilers

Related articles
SPARC compiler optimisation gregw@highland.oz.au (1992-02-13)
Re: SPARC compiler optimisation casper@fwi.uva.nl (1992-02-14)
Re: SPARC compiler optimisation how@leland.stanford.edu (1992-02-14)
Re: SPARC compiler optimisation ucsd!math.ucla.edu!pmontgom@uunet.uu.net (1992-02-15)
Re: SPARC compiler optimisation grunwald@foobar.cs.colorado.edu (1992-02-22)
Re: SPARC compiler optimisation andrew@highland.oz.au (1992-02-26)
Re: SPARC compiler optimisation dmk@craycos.com (1992-02-27)
Re: SPARC compiler optimisation nickh@CS.CMU.EDU (1992-02-28)
[3 later articles]
| List of all articles for this month |
Newsgroups: comp.compilers,comp.sys.sun.misc
From: casper@fwi.uva.nl (Casper H.S. Dik)
Keywords: optimize, sparc, question
Organization: FWI, University of Amsterdam
References: 92-02-062
Date: Fri, 14 Feb 1992 09:42:38 GMT

gregw@highland.oz.au (Greg Wilkins) writes:


>I need to generate frequent checksums on a SPARC machine, and I have
>been looking at the code the sun optimizer produces for the code:


>main()
>{
> int block[1000];
> register int i;
> register int xs=0;


> for(i=999;i>0;i--)
> xs|=*(block+i);


> return xs;
>}




>The best that -O4 can do is a 5 tick loop:


>L77010:
> ld [%i5],%l3
> dec 4,%i5
> cmp %i5,%i3
> bgu L77010
> or %i4,%l3,%i4


>where %i5 is set to block+999 and %i3 is block




>It is possible to write a 4 tick loop (20% saving) :


>LMYLOOP:
> ld [%i3+%i5],%l3
> deccc 4,%i5
> bcc LMYLOOP
> or %i4,%l3,%i4


My version of the bundled C compiler does a 3 tick loop. The loop is
unrolled partly and the body of the loop is executed 4 times, each
iteration. The ansi C compiler, the unbundled C compiler and gcc do much
worse.


Ansi C 1.1: 7 instructions (-fast)
Gcc (1.40): 7 instructions, including 1 nop (-O -fstrength-reduce)
U cc 1.1: 7 instructions (-O4)


Plain old cc outperforms all other.


cc:
_main:
                save %sp,-0xfe0,%sp
                mov 0xf9c,%i5
                add %fp,-0xfa0,%o0
                add %i5,%o0,%i5
                add %fp,-0xfa0,%o2
                mov 0,%i3
                add %i3,%o2,%i3
                sub %i5,12,%o5
                cmp %o5,%i3
                bleu L77010
                mov 0,%i4


L77003:
                ld [%i5],%o7 ; this is the loop body.
                ld [%i5-4],%l0 ; load 4 ints, | them
                ld [%i5-8],%l1
                ld [%i5-12],%l2
                dec 16,%i5 ; decrement counter by 4
                or %i4,%o7,%i4
                or %i4,%l0,%i4
                sub %i5,12,%l3 ; more than 3 ints left?
                cmp %l3,%i3
                or %i4,%l1,%i4
                bgu L77003
                or %i4,%l2,%i4


                cmp %i5,%i3
                bleu L77006
                nop
L77010: ; did you mistake this for the loop body?
                ld [%i5],%l4
                dec 4,%i5
                cmp %i5,%i3
                bgu L77010
                or %i4,%l4,%i4
L77006:
                ret
                restore %g0,%i4,%o0
--
Casper H.S. Dik
casper@fwi.uva.nl
Faculty of Mathematics & Computer Science, University of Amsterdam
Kruislaan 403, NL-1098 SJ Amsterdam, The Netherlands
Phone: +31 20 525 7463, Telex: 10262 hef nl, Fax: +31 20 525 7490
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.