|prefetching (was:Re: Future of architecture) firstname.lastname@example.org (1995-11-10)|
|Re: prefetching (was:Re: Future of architecture) eanders@ayer.CS.Berkeley.EDU (1995-11-17)|
|From:||email@example.com (Mark Smotherman)|
|Date:||Fri, 10 Nov 1995 21:06:48 GMT|
Have these software prefetch techniques been investigated? If so, who has
published them and/or who is doing them in a production compiler/linker?
Are they wins, and if so, by how much?
1. inst. and data prefetch with subroutine calls?
- upstream from a procedure call, issue an inst. prefetch for
the procedure entry point
- upstream from a procedure call, issue a data prefetch (or line
allocate, which omits refill) for the procedure's stack frame
- upstream from a procedure return, issue a data prefetch for
the caller's stack frame
- let the linker associate global data areas with the procedures that
use these areas, and thus upstream from a procedure call, have the
linker insert data prefetches for the associated global data areas
2. heap management tricks?
- in-line a routine at each malloc call site that initially allocates
a contiguous region of multiple blocks (each of the request size)
and then doles these out as it is re-invoked (this is similar to
the logical record/physical record handling in I/O and might help
increase spatial locality in large-line-size caches) -- I know of
malloc implementations that keep separate free lists based on fixed-
size allocations but call-site specific allocation seems like it
could increase locality
- some students and I tried adding a prefetch pointer to a linked list
structure to enable us to process three list nodes per list-traversal
iteration; we obtained a 24% improvement in per node time on an Alpha
21164 (but we got a 162% improvement by using a circular buffer - see
the second bullet in http://www.cs.clemson.edu/~mark/arch.html)
Mark Smotherman, Computer Science Dept., Clemson University, Clemson, SC
Return to the
Search the comp.compilers archives again.