prefetching (was:Re: Future of architecture)

mark@hubcap.clemson.edu (Mark Smotherman)
Fri, 10 Nov 1995 21:06:48 GMT

From comp.compilers

Related articles
*prefetching (was:Re: Future of architecture) mark@hubcap.clemson.edu* (1995-11-10)**
Re: prefetching (was:Re: Future of architecture) eanders@ayer.CS.Berkeley.EDU (1995-11-17)

| List of all articles for this month |

Newsgroups:	comp.arch,comp.compilers
From:	mark@hubcap.clemson.edu (Mark Smotherman)
Keywords:	architecture
Organization:	Clemson University
References:	<47kodf$8a7@usenet.pa.dec.com> <4807v2$ddc@copland.udel.edu>
Date:	Fri, 10 Nov 1995 21:06:48 GMT

Have these software prefetch techniques been investigated? If so, who has
published them and/or who is doing them in a production compiler/linker?
Are they wins, and if so, by how much?

1. inst. and data prefetch with subroutine calls?

      - upstream from a procedure call, issue an inst. prefetch for
              the procedure entry point

      - upstream from a procedure call, issue a data prefetch (or line
              allocate, which omits refill) for the procedure's stack frame

      - upstream from a procedure return, issue a data prefetch for
              the caller's stack frame

      - let the linker associate global data areas with the procedures that
              use these areas, and thus upstream from a procedure call, have the
              linker insert data prefetches for the associated global data areas

2. heap management tricks?

      - in-line a routine at each malloc call site that initially allocates
              a contiguous region of multiple blocks (each of the request size)
              and then doles these out as it is re-invoked (this is similar to
              the logical record/physical record handling in I/O and might help
              increase spatial locality in large-line-size caches) -- I know of
              malloc implementations that keep separate free lists based on fixed-
              size allocations but call-site specific allocation seems like it
              could increase locality

      - some students and I tried adding a prefetch pointer to a linked list
              structure to enable us to process three list nodes per list-traversal
              iteration; we obtained a 24% improvement in per node time on an Alpha
              21164 (but we got a 162% improvement by using a circular buffer - see
              the second bullet in http://www.cs.clemson.edu/~mark/arch.html)

--
Mark Smotherman, Computer Science Dept., Clemson University, Clemson, SC
http://www.cs.clemson.edu/~mark/homepage.html
--

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

prefetching (was:Re: Future of architecture)

mark@hubcap.clemson.edu (Mark Smotherman)Fri, 10 Nov 1995 21:06:48 GMT

mark@hubcap.clemson.edu (Mark Smotherman)
Fri, 10 Nov 1995 21:06:48 GMT