Re: Can this type of cache miss be reduced?

Max Hailperin <max@gustavus.edu>
Wed, 03 Jun 2009 08:15:13 -0500

From comp.compilers

Related articles
Can this type of cache miss be reduced? joefoxreal@gmail.com (Eric Fisher) (2009-06-01)
Re: Can this type of cache miss be reduced? gneuner2@comcast.net (George Neuner) (2009-06-01)
Re: Can this type of cache miss be reduced? max@gustavus.edu (Max Hailperin) (2009-06-02)
Re: Can this type of cache miss be reduced? joefoxreal@gmail.com (Eric Fisher) (2009-06-03)
Re: Can this type of cache miss be reduced? lkrupp@indra.com (Louis Krupp) (2009-06-03)
Re: Can this type of cache miss be reduced? gah@ugcs.caltech.edu (glen herrmannsfeldt) (2009-06-03)
*Re: Can this type of cache miss be reduced? max@gustavus.edu (Max Hailperin)* (2009-06-03)**

| List of all articles for this month |

From:	Max Hailperin <max@gustavus.edu>
Newsgroups:	comp.compilers
Date:	Wed, 03 Jun 2009 08:15:13 -0500
Organization:	Compilers Central
References:	09-06-003 09-06-010
Keywords:	architecture
Posted-Date:	03 Jun 2009 09:18:06 EDT

Eric Fisher <joefoxreal@gmail.com> writes:
...
> for (i=0; i<m; i+=n){
> prefetch(a[i+pd]);
> sum[0] += a[i];
> for(i1 = i+1; i1 < min(m, i+n); i1++){
> sum[0] += a[i1];
> }
> }
>
> I tried this method in my test program. The surprising thing is that the
> performance is degrading due to the loop splitting. Even though the data
> prefetching can get back some benefit, the overall performance is lower
> than before.

Depending on the compiler and architecture, you may well see some
improvement if you take one or both of these two steps:

(1) Deal outside the main loop with the possibility that m is not a
multiple of n. That is, peel off the first or last m%n iterations
into a separate loop, so that your inner loop can always go up to i+n
rather than needing the min with m.

(2) Then you can fully unroll the n iterations of the inner loop, so
that you have something like

for (i=0; i<m; i+=n){
      prefetch(a[i+pd]);
      sum[0] += a[i];
      sum[0] += a[i+1];
      // ...
      sum[0] += a[i+n-1];
}

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Can this type of cache miss be reduced?

Max Hailperin <max@gustavus.edu>Wed, 03 Jun 2009 08:15:13 -0500

Max Hailperin <max@gustavus.edu>
Wed, 03 Jun 2009 08:15:13 -0500