Related articles |
---|
Can this type of cache miss be reduced? joefoxreal@gmail.com (Eric Fisher) (2009-06-01) |
Re: Can this type of cache miss be reduced? gneuner2@comcast.net (George Neuner) (2009-06-01) |
Re: Can this type of cache miss be reduced? max@gustavus.edu (Max Hailperin) (2009-06-02) |
Re: Can this type of cache miss be reduced? joefoxreal@gmail.com (Eric Fisher) (2009-06-03) |
Re: Can this type of cache miss be reduced? lkrupp@indra.com (Louis Krupp) (2009-06-03) |
Re: Can this type of cache miss be reduced? gah@ugcs.caltech.edu (glen herrmannsfeldt) (2009-06-03) |
Re: Can this type of cache miss be reduced? max@gustavus.edu (Max Hailperin) (2009-06-03) |
From: | Max Hailperin <max@gustavus.edu> |
Newsgroups: | comp.compilers |
Date: | Wed, 03 Jun 2009 08:15:13 -0500 |
Organization: | Compilers Central |
References: | 09-06-003 09-06-010 |
Keywords: | architecture |
Posted-Date: | 03 Jun 2009 09:18:06 EDT |
Eric Fisher <joefoxreal@gmail.com> writes:
...
> for (i=0; i<m; i+=n){
> prefetch(a[i+pd]);
> sum[0] += a[i];
> for(i1 = i+1; i1 < min(m, i+n); i1++){
> sum[0] += a[i1];
> }
> }
>
> I tried this method in my test program. The surprising thing is that the
> performance is degrading due to the loop splitting. Even though the data
> prefetching can get back some benefit, the overall performance is lower
> than before.
Depending on the compiler and architecture, you may well see some
improvement if you take one or both of these two steps:
(1) Deal outside the main loop with the possibility that m is not a
multiple of n. That is, peel off the first or last m%n iterations
into a separate loop, so that your inner loop can always go up to i+n
rather than needing the min with m.
(2) Then you can fully unroll the n iterations of the inner loop, so
that you have something like
for (i=0; i<m; i+=n){
prefetch(a[i+pd]);
sum[0] += a[i];
sum[0] += a[i+1];
// ...
sum[0] += a[i+n-1];
}
Return to the
comp.compilers page.
Search the
comp.compilers archives again.