|Re: non-caching load and GC firstname.lastname@example.org (Steven A. Moyer) (1993-04-02)|
|Utilization of Non-caching Access Instructions email@example.com (Steven A. Moyer) (1993-04-06)|
|From:||"Steven A. Moyer" <firstname.lastname@example.org>|
|Keywords:||optimize, architecture, GC, report, FTP|
|Organization:||University of Virginia Computer Science Department|
|Date:||Tue, 6 Apr 1993 14:22:32 GMT|
In following up a thread on the utilization of non-caching load
instructions (ala i860) for implementing GC algorithms, I discussed a
general optimization for increasing effective memory bandwidth that
utilized such an instruction. The techreports I cited contained some
older work and I received many requests to make available the newer
recently completed dissertation text.
I have made the complete text a technical report and have placed it in an
anonymous ftp directory located at uvacs.cs.virginia.edu. The report is
the compressed postscript file:
I hope this information proves useful; comments are certainly welcome.
And yes, I've learned my lesson about posting references to older material
Access Ordering and Effective Memory Bandwidth
High-performance scalar processors are characterized by multiple pipelined
functional units that can be initiated simultaneously to exploit
instruction level parallelism. For scientific codes, the performance of
these processors depends heavily on memory bandwidth. To achieve peak
processor rate, data must be supplied to the arithmetic units at the peak
aggregate rate of consumption.
Access ordering, a loop optimization that reorders non-caching accesses to
better utilize memory system resources, is a compiler technology that
addresses the memory bandwidth problem for scalar processors executing
scientific codes. For a given computation, memory architecture, and
memory device type, an access ordering algorithm determines a well-defined
interleaving of vector references that maximizes effective bandwidth.
Consequently, analytic models of performance can also be derived.
Access ordering is fundamentally different from, though complementary to,
both caching and access scheduling techniques that attempt to overlap
computation with memory latency. Simulation results demonstrate that
for a given computation, access ordering can significantly increase
effective bandwidth over that achieved by the natural reference sequence.
Computer Science Department
University of Virginia
Return to the
Search the comp.compilers archives again.