Related articles |
---|
Access pattern optimizations sam2y@koa.cs.Virginia.EDU (1991-07-26) |
Re: Access pattern optimizations moss@cs.umass.edu (1991-07-27) |
Newsgroups: | comp.compilers |
From: | sam2y@koa.cs.Virginia.EDU (Steven A. Moyer) |
Keywords: | optimize |
Organization: | Dept. of Computer Science, University of Virginia |
Date: | Fri, 26 Jul 91 15:21:35 GMT |
Does anyone know of any work on optimizations/transformations of
access patterns for scalar processors? In particular, given a
loop:
do 1 k = 1, n
1 x(k) = y(k+1) + y(k)
a straight-forward translation would generate the accesses:
load y(2), load y(1), store x(1)
load y(3), load y(2), store x(2)
etc.
with 2 loads and 1 store per iteration. It would be preferable to
generate the access pattern:
load y(1)
load y(2), store x(1)
load y(3), store x(2)
etc.
which re-uses a previously loaded value so that only 1 load and 1 store
is required per iteration. Furthermore, the later access pattern loads
y values and stores x values as vectors with stride 1; given an
interleaved memory system, if the loop which generates this access pattern
were unrolled and accesses to each vector grouped then memory bandwidth
would be improved considerably.
It is easy to imagine more complex loops which would also benefit
from this type of transformation.
Maybe this could be handled by some form of global register coloring
scheme, I'm not sure.
If you know of any published work in this area I would greatly
appreciate references. Please send responses to: sam2y@virginia.edu
Thanks.
--Steve
Steve Moyer
Dept. of Computer Science
University of Virginia
sam2y@virginia.edu
[This is pretty similar to software pipelining, a well-known optimization
used in unrolled loops. -John]
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.