Re: list locality (WAS: Parallelizing...)

cliffc@ami.sps.mot.com (Cliff Click)
Tue, 28 Nov 1995 14:07:43 GMT

From comp.compilers

Related articles
Re: Parallelizing (WAS: Death by pointers.) Martin.Jourdan@inria.fr (1995-10-18)
Re: Parallelizing (WAS: Death by pointers.) wclodius@lanl.gov (1995-10-28)
Re: Parallelizing (WAS: Death by pointers.) cliffc@ami.sps.mot.com (1995-11-03)
list locality (WAS: Parallelizing...) boehm@parc.xerox.com (1995-11-22)
*Re: list locality (WAS: Parallelizing...) cliffc@ami.sps.mot.com* (1995-11-28)**

| List of all articles for this month |

Newsgroups:	comp.compilers
From:	cliffc@ami.sps.mot.com (Cliff Click)
Keywords:	storage, architecture, optimize
Organization:	none
References:	95-10-092 95-11-015 95-11-047 95-11-194
Date:	Tue, 28 Nov 1995 14:07:43 GMT

boehm@parc.xerox.com (Hans Boehm) writes:

> cliffc@ami.sps.mot.com (Cliff Click) writes:
> > In passing - I long ago gave up on linked lists except for cases that
> > require dynamic insertion in the middle or huge/infrequent structures,
> > and use arrays instead. When spinning down the list the locality of
> > arrays beats pointer-chasing-cache-misses any day.
>
> This brings up an interesting question that I previously hadn't thought
> about. Garbage collecting allocators (even those that don't move
> objects) often place consecutively allocated objects of the same
> size in adjacent locations. Thus the main cause of a decrease in cache
> performance is likely to be the space taken up by the pointers.
>
> If I allocate object B after object A, and both are of the same size,
> does it matter which one has the larger address? (My guess is yes,
> for some cache architectures. It might be faster to step through
> addresses in increasing order.)

Some caches load a whole line in parallel, no preference for B > A.
Some block until the entire line is loaded, so behavior is as above.
Some caches do critical word first, so again no preference.
I would think very few caches load the line with multiple loads, AND
    the loads go from low to high AND the CPU is unblocked if its
    accessing from the low part of the line.

I've heard about folks who prefetch based on sequential access: if you
touch some number of consecutive addresses in a row, the hardware
starts prefetching. I don't know of any in production use.

Lotsa CPUs have prefetch instructions or hints: load to register R0,
load w/update, or even explicit instructions.

> If the answer is yes, then presumably for mostly functional languages
> I should allocate A at a higher address than B, since B will usually be
> accessed first. What's the right order for C/C++ programs?

I think the answer is: "stride-1 access & order is less important", or
maybe: "unlike lists, when scanning arrays the compiler can insert
prefetches so every memory ref is a cache hit", or maybe even "pad &
align structures on cache-line boundaries".

Since the performance of left-to-right vs right-to-left ordering is
swamped by array-vs-list issues, I'd say the right answer is: "Do what
comes naturally", which generally means low-to-high:

    struct foo *Array = (struct foo*)malloc( sizeof(struct foo) * N );
    for( i=0; i<N; i++ )
        Array[i] = ...;

Cliff
--
Cliff Click Compiler Researcher & Designer
RISC Software, Motorola PowerPC Compilers
cliffc@risc.sps.mot.com (512) 891-7240
--

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: list locality (WAS: Parallelizing...)

cliffc@ami.sps.mot.com (Cliff Click)Tue, 28 Nov 1995 14:07:43 GMT

cliffc@ami.sps.mot.com (Cliff Click)
Tue, 28 Nov 1995 14:07:43 GMT