Re: Can this type of cache miss be reduced?

Louis Krupp <lkrupp@indra.com>
Wed, 03 Jun 2009 00:04:35 -0600

          From comp.compilers

Related articles
Can this type of cache miss be reduced? joefoxreal@gmail.com (Eric Fisher) (2009-06-01)
Re: Can this type of cache miss be reduced? gneuner2@comcast.net (George Neuner) (2009-06-01)
Re: Can this type of cache miss be reduced? max@gustavus.edu (Max Hailperin) (2009-06-02)
Re: Can this type of cache miss be reduced? joefoxreal@gmail.com (Eric Fisher) (2009-06-03)
Re: Can this type of cache miss be reduced? lkrupp@indra.com (Louis Krupp) (2009-06-03)
Re: Can this type of cache miss be reduced? gah@ugcs.caltech.edu (glen herrmannsfeldt) (2009-06-03)
Re: Can this type of cache miss be reduced? max@gustavus.edu (Max Hailperin) (2009-06-03)
| List of all articles for this month |

From: Louis Krupp <lkrupp@indra.com>
Newsgroups: comp.compilers
Date: Wed, 03 Jun 2009 00:04:35 -0600
Organization: indra.com
References: 09-06-003
Keywords: architecture
Posted-Date: 03 Jun 2009 05:50:31 EDT

Eric Fisher wrote:
> Hi,
>
> Optimizations for cache miss are often that loop transformations, such
> as loop interchange, loop blocking, etc.
>
> But, for a large one-dimensional array, suppose the elements are only
> accessed once, can we still reduce the cache miss?
>
> Example:
>
> #define NUM 320*240*3
> static const char a[NUM] = {.......};
> char *ptr=a;
> for (i = 0; i < NUM; i++)
> {
> x = *ptr++;
> y = *ptr++;
> z = *ptr++;
>
> fun(x, y, z);
> }


In this particular case, for what it's worth, you might be able to do
something like the following:


#define NUM_GROUPS (320 * 240)
#define NUM_ELEMS (NUM_GROUPS * 3)
#define NUM_WORDS ((NUM_ELEMS + 3) / 4)


static const char a[NUM_ELEMS] = {.......};
uint32 *wptr = (uint32*)a; /* uint32 = unsigned 32-bit int */


for (i = 0; i < NUM_ELEMS / 12; i++)
      {
          uint32 u = *wptr++;
          uint32 v = *wptr++;
          uint32 w = *wptr++;


          /* assume big-endian machine */


          /*
            * u, v, and w are likely to be in registers which might persist
            * across calls to fun().
            */


          fun( u >> 24 , (u >> 16) & 0xff, (u >> 8) & 0xff);
          fun( u & 0xff, v >> 24 , (v >> 16) & 0xff);
          fun((v >> 8) & 0xff, v & 0xff, w >> 24 );
          fun((w >> 16) & 0xff, (w >> 8) & 0xff, w & 0xff);
      }


Louis



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.