Related articles |
---|
SPARC compiler optimisation gregw@highland.oz.au (1992-02-13) |
Re: SPARC compiler optimisation casper@fwi.uva.nl (1992-02-14) |
Re: SPARC compiler optimisation how@leland.stanford.edu (1992-02-14) |
Re: SPARC compiler optimisation ucsd!math.ucla.edu!pmontgom@uunet.uu.net (1992-02-15) |
Re: SPARC compiler optimisation grunwald@foobar.cs.colorado.edu (1992-02-22) |
Re: SPARC compiler optimisation andrew@highland.oz.au (1992-02-26) |
Re: SPARC compiler optimisation dmk@craycos.com (1992-02-27) |
Re: SPARC compiler optimisation nickh@CS.CMU.EDU (1992-02-28) |
[3 later articles] |
Newsgroups: | comp.compilers,comp.sys.sun.misc |
From: | casper@fwi.uva.nl (Casper H.S. Dik) |
Keywords: | optimize, sparc, question |
Organization: | FWI, University of Amsterdam |
References: | 92-02-062 |
Date: | Fri, 14 Feb 1992 09:42:38 GMT |
gregw@highland.oz.au (Greg Wilkins) writes:
>I need to generate frequent checksums on a SPARC machine, and I have
>been looking at the code the sun optimizer produces for the code:
>main()
>{
> int block[1000];
> register int i;
> register int xs=0;
> for(i=999;i>0;i--)
> xs|=*(block+i);
> return xs;
>}
>The best that -O4 can do is a 5 tick loop:
>L77010:
> ld [%i5],%l3
> dec 4,%i5
> cmp %i5,%i3
> bgu L77010
> or %i4,%l3,%i4
>where %i5 is set to block+999 and %i3 is block
>It is possible to write a 4 tick loop (20% saving) :
>LMYLOOP:
> ld [%i3+%i5],%l3
> deccc 4,%i5
> bcc LMYLOOP
> or %i4,%l3,%i4
My version of the bundled C compiler does a 3 tick loop. The loop is
unrolled partly and the body of the loop is executed 4 times, each
iteration. The ansi C compiler, the unbundled C compiler and gcc do much
worse.
Ansi C 1.1: 7 instructions (-fast)
Gcc (1.40): 7 instructions, including 1 nop (-O -fstrength-reduce)
U cc 1.1: 7 instructions (-O4)
Plain old cc outperforms all other.
cc:
_main:
save %sp,-0xfe0,%sp
mov 0xf9c,%i5
add %fp,-0xfa0,%o0
add %i5,%o0,%i5
add %fp,-0xfa0,%o2
mov 0,%i3
add %i3,%o2,%i3
sub %i5,12,%o5
cmp %o5,%i3
bleu L77010
mov 0,%i4
L77003:
ld [%i5],%o7 ; this is the loop body.
ld [%i5-4],%l0 ; load 4 ints, | them
ld [%i5-8],%l1
ld [%i5-12],%l2
dec 16,%i5 ; decrement counter by 4
or %i4,%o7,%i4
or %i4,%l0,%i4
sub %i5,12,%l3 ; more than 3 ints left?
cmp %l3,%i3
or %i4,%l1,%i4
bgu L77003
or %i4,%l2,%i4
cmp %i5,%i3
bleu L77006
nop
L77010: ; did you mistake this for the loop body?
ld [%i5],%l4
dec 4,%i5
cmp %i5,%i3
bgu L77010
or %i4,%l4,%i4
L77006:
ret
restore %g0,%i4,%o0
--
Casper H.S. Dik
casper@fwi.uva.nl
Faculty of Mathematics & Computer Science, University of Amsterdam
Kruislaan 403, NL-1098 SJ Amsterdam, The Netherlands
Phone: +31 20 525 7463, Telex: 10262 hef nl, Fax: +31 20 525 7490
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.