Re: SPARC compiler optimisation

andrew@highland.oz.au (Andrew Morton)
Wed, 26 Feb 92 03:07:04 GMT

          From comp.compilers

Related articles
SPARC compiler optimisation gregw@highland.oz.au (1992-02-13)
Re: SPARC compiler optimisation casper@fwi.uva.nl (1992-02-14)
Re: SPARC compiler optimisation how@leland.stanford.edu (1992-02-14)
Re: SPARC compiler optimisation ucsd!math.ucla.edu!pmontgom@uunet.uu.net (1992-02-15)
Re: SPARC compiler optimisation grunwald@foobar.cs.colorado.edu (1992-02-22)
Re: SPARC compiler optimisation andrew@highland.oz.au (1992-02-26)
Re: SPARC compiler optimisation dmk@craycos.com (1992-02-27)
Re: SPARC compiler optimisation nickh@CS.CMU.EDU (1992-02-28)
Re: SPARC compiler optimisation nickh@CS.CMU.EDU (1992-03-02)
Re: SPARC compiler optimisation preston@dawn.cs.rice.edu (1992-03-02)
Load and store double (WAS: SPARC compiler optimisation) pardo@cs.washington.edu (1992-03-03)
Re: SPARC compiler optimisation vanroy@prl.dec.com (1992-03-09)
| List of all articles for this month |

Newsgroups: comp.compilers
From: andrew@highland.oz.au (Andrew Morton)
Keywords: sparc, optimize
Organization: Highland Logic, Moss Vale, Australia
Date: Wed, 26 Feb 92 03:07:04 GMT

> grunwald@foobar.cs.colorado.edu says:
>From comparison purposes, here's the code from gcc2, using
> gcc -O5 -funroll-loops foo.c


> ...
> ld [%o1-4000],%o0
> or %i0,%o0,%i0
> ld [%o1-4004],%o0
> or %i0,%o0,%i0
> ld [%o1-4008],%o0
> or %i0,%o0,%i0
> ld [%o1-4012],%o0
> or %i0,%o0,%i0
> ld [%o1-4016],%o0
> or %i0,%o0,%i0
> ld [%o1-4020],%o0
> or %i0,%o0,%i0
> ld [%o1-4024],%o0
> or %i0,%o0,%i0
> ld [%o1-4028],%o0


Forgive me if I'm wrong, but does this not incur a pipeline stall on
each 'or', waiting for the load to complete?


'twould be better thus:


ld [%o1-4000],%o0
ld [%o1-4004],%o2
ld [%o1-4008],%o3
ld [%o1-4012],%o4
ld [%o1-4016],%o5
ld [%o1-4020],%o6
ld [%o1-4024],%o7
ld [%o1-4028],%l0
or %o2,%o0,%o0
or %o3,%o0,%o0
or %o4,%o0,%o0
or %o5,%o0,%o0
or %o6,%o0,%o0
or %o7,%o0,%o0
or %l0,%o0,%o0


register hungry of course, but worthwhile if the loop is executed
enough times. The 'or's could be tweaked down to 4 + 2 + 1 as well,
using more registers.
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.