Re: Can this type of cache miss be reduced?

glen herrmannsfeldt <gah@ugcs.caltech.edu>
Wed, 3 Jun 2009 11:06:54 +0000 (UTC)

          From comp.compilers

Related articles
Can this type of cache miss be reduced? joefoxreal@gmail.com (Eric Fisher) (2009-06-01)
Re: Can this type of cache miss be reduced? gneuner2@comcast.net (George Neuner) (2009-06-01)
Re: Can this type of cache miss be reduced? max@gustavus.edu (Max Hailperin) (2009-06-02)
Re: Can this type of cache miss be reduced? joefoxreal@gmail.com (Eric Fisher) (2009-06-03)
Re: Can this type of cache miss be reduced? lkrupp@indra.com (Louis Krupp) (2009-06-03)
Re: Can this type of cache miss be reduced? gah@ugcs.caltech.edu (glen herrmannsfeldt) (2009-06-03)
Re: Can this type of cache miss be reduced? max@gustavus.edu (Max Hailperin) (2009-06-03)
| List of all articles for this month |
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Newsgroups: comp.compilers
Date: Wed, 3 Jun 2009 11:06:54 +0000 (UTC)
Organization: California Institute of Technology, Pasadena
References: 09-06-003 09-06-010
Keywords: architecture
Posted-Date: 03 Jun 2009 09:17:40 EDT

Eric Fisher <joefoxreal@gmail.com> wrote:


< I'm looking at the data prefetching. There's an example
< of loop spliting in "The compiler Design Handbook"
(snip)


< So we can insert prefetching as


< for (i=0; i<m; i+=n){
< prefetch(a[i+pd]);
< sum[0] += a[i];
< for(i1 = i+1; i1 < min(m, i+n); i1++){
< sum[0] += a[i1];
< }
< }


< I tried this method in my test program. The surprising thing is that the
< performance is degrading due to the loop splitting. Even though the data
< prefetching can get back some benefit, the overall performance is lower
< than before.


What is pd? It would seem to depend on where in a cache line
a[0] is, but you usually don't know that. If you can modify
your allocation routine to allocate on cache line boundaries
it might work better.


Also, there is a fair amount of extra loop overhead that
you have to make up for, including the min calculation
inside the inner loop. I would have written (i1<m && i1<i+n),
though that may not be any better.


-- glen



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.