prefetching (was:Re: Future of architecture)

mark@hubcap.clemson.edu (Mark Smotherman)
Fri, 10 Nov 1995 21:06:48 GMT

          From comp.compilers

Related articles
prefetching (was:Re: Future of architecture) mark@hubcap.clemson.edu (1995-11-10)
Re: prefetching (was:Re: Future of architecture) eanders@ayer.CS.Berkeley.EDU (1995-11-17)
| List of all articles for this month |

Newsgroups: comp.arch,comp.compilers
From: mark@hubcap.clemson.edu (Mark Smotherman)
Keywords: architecture
Organization: Clemson University
References: <47kodf$8a7@usenet.pa.dec.com> <4807v2$ddc@copland.udel.edu>
Date: Fri, 10 Nov 1995 21:06:48 GMT

Have these software prefetch techniques been investigated? If so, who has
published them and/or who is doing them in a production compiler/linker?
Are they wins, and if so, by how much?




1. inst. and data prefetch with subroutine calls?


      - upstream from a procedure call, issue an inst. prefetch for
              the procedure entry point


      - upstream from a procedure call, issue a data prefetch (or line
              allocate, which omits refill) for the procedure's stack frame


      - upstream from a procedure return, issue a data prefetch for
              the caller's stack frame


      - let the linker associate global data areas with the procedures that
              use these areas, and thus upstream from a procedure call, have the
              linker insert data prefetches for the associated global data areas




2. heap management tricks?


      - in-line a routine at each malloc call site that initially allocates
              a contiguous region of multiple blocks (each of the request size)
              and then doles these out as it is re-invoked (this is similar to
              the logical record/physical record handling in I/O and might help
              increase spatial locality in large-line-size caches) -- I know of
              malloc implementations that keep separate free lists based on fixed-
              size allocations but call-site specific allocation seems like it
              could increase locality


      - some students and I tried adding a prefetch pointer to a linked list
              structure to enable us to process three list nodes per list-traversal
              iteration; we obtained a 24% improvement in per node time on an Alpha
              21164 (but we got a 162% improvement by using a circular buffer - see
              the second bullet in http://www.cs.clemson.edu/~mark/arch.html)




--
Mark Smotherman, Computer Science Dept., Clemson University, Clemson, SC
http://www.cs.clemson.edu/~mark/homepage.html
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.