Utilization of Non-caching Access Instructions

"Steven A. Moyer" <sam2y@server.cs.virginia.edu>
Tue, 6 Apr 1993 14:22:32 GMT

          From comp.compilers

Related articles
Re: non-caching load and GC sam2y@koa.cs.virginia.edu (Steven A. Moyer) (1993-04-02)
Utilization of Non-caching Access Instructions sam2y@server.cs.virginia.edu (Steven A. Moyer) (1993-04-06)
| List of all articles for this month |

Newsgroups: comp.arch,comp.compilers,comp.object
From: "Steven A. Moyer" <sam2y@server.cs.virginia.edu>
Originator: sam2y@koa.cs.Virginia.EDU
Keywords: optimize, architecture, GC, report, FTP
Organization: University of Virginia Computer Science Department
References: <C4ppA5.BLx.1@cs.cmu.edu> 93-04-013
Date: Tue, 6 Apr 1993 14:22:32 GMT

In following up a thread on the utilization of non-caching load
instructions (ala i860) for implementing GC algorithms, I discussed a
general optimization for increasing effective memory bandwidth that
utilized such an instruction. The techreports I cited contained some
older work and I received many requests to make available the newer
recently completed dissertation text.

I have made the complete text a technical report and have placed it in an
anonymous ftp directory located at uvacs.cs.virginia.edu. The report is
the compressed postscript file:


I hope this information proves useful; comments are certainly welcome.
And yes, I've learned my lesson about posting references to older material



                            Access Ordering and Effective Memory Bandwidth

High-performance scalar processors are characterized by multiple pipelined
functional units that can be initiated simultaneously to exploit
instruction level parallelism. For scientific codes, the performance of
these processors depends heavily on memory bandwidth. To achieve peak
processor rate, data must be supplied to the arithmetic units at the peak
aggregate rate of consumption.

Access ordering, a loop optimization that reorders non-caching accesses to
better utilize memory system resources, is a compiler technology that
addresses the memory bandwidth problem for scalar processors executing
scientific codes. For a given computation, memory architecture, and
memory device type, an access ordering algorithm determines a well-defined
interleaving of vector references that maximizes effective bandwidth.
Consequently, analytic models of performance can also be derived.

Access ordering is fundamentally different from, though complementary to,
both caching and access scheduling techniques that attempt to overlap
computation with memory latency. Simulation results demonstrate that
for a given computation, access ordering can significantly increase
effective bandwidth over that achieved by the natural reference sequence.
Steve Moyer
Computer Science Department
University of Virginia

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.