Related articles |
---|
THESIS AVAILABLE: ... Reducing Load Latency taustin@ichips.intel.com (1996-09-15) |
From: | taustin@ichips.intel.com (Todd Austin) |
Newsgroups: | comp.arch,comp.compilers |
Date: | 15 Sep 1996 00:41:37 -0400 |
Organization: | Intel Corp. |
Keywords: | report, performance, architecture |
Greetings, my Ph.D. thesis is now available from UW-Madison (UW CS TR-1311).
This work was performed in conjunction with my thesis advisor, Guri Sohi.
Umpteen years in the making, I hope you enjoy it. Comments and questions
welcomed. The compressed postscript for the thesis is in the file:
ftp://ftp.cs.wisc.edu/sohi/austin.thesis.ps.Z
Details on the contents follow... Regards, -Todd
--
TITLE: Hardware and Software Mechanisms for Reducing Load Latency
BY: Todd M. Austin
ABSTRACT:
As processor demands quickly outpace memory, the performance of load
instructions becomes an increasingly critical component to good system
performance. This thesis contributes four novel load latency reduction
techniques, each targeting a different component of load latency: address
calculation, data cache access, address translation, and data cache misses.
The contributed techniques are as follows:
- Fast Address Calculation employs a stateless set index predictor
to allow address calculation to overlap with data cache access. The
design eliminates the latency of address calculation for many loads.
- Zero-Cycle Loads combine fast address calculation with an
early-issue mechanism to produce pipeline designs capable of hiding the
latency of many loads that hit in the data cache.
- High-Bandwidth Address Translation develops address translation
mechanisms with better latency and area characteristics than a
multi-ported TLB. The new designs provide multiple-issue processors
with effective alternatives for keeping address translation off the
critical path of data cache access.
- Cache-conscious Data Placement is a profile-guided data placement
optimization for reducing the frequency of data cache misses. The
approach employs heuristic algorithms to find variable placement
solutions that decrease inter-variable conflict, and increase cache
line utilization and block prefetch.
Detailed design descriptions and experimental evaluations are provided for each
approach, confirming the designs as cost-effective and practical solutions for
reducing load latency.
TABLE OF CONTENTS:
1. Introduction
1.1 Anatomy of a Load
1.2 The Impact of Load Latency
1.3 Reducing the Impact of Load Latency
1.4 Contributions of This Thesis
1.5 Organization of This Thesis
2. Experimental Framework
2.1 Compiler Tools
2.2 Simulation Methodology
2.3 Analyzed Programs
3. Fast Address Calculation
3.1 Introduction
3.2 Program Reference Behavior
3.3 Fast Address Calculation
3.4 Working Examples
3.5 Increasing Prediction Performance with Software Support
3.6 Experimental Evaluation
3.7 Related Work
3.8 Chapter Summary
4. Zero-Cycle Loads
4.1 Introduction
4.2 Zero-Cycle Loads
4.3 A Working Example
4.4 Experimental Evaluation
4.5 Related Work
4.6 Chapter Summary
5. High-Bandwidth Address Translation
5.1 Introduction
5.2 Impact of Address Translation on System Performance
5.3 High-Bandwidth Address Translation
5.4 Experimental Evaluation
5.5 Chapter Summary
6. Cache-Conscious Data Placement
6.1 Introduction
6.2 How Variable Placement Affects Data Cache Performance
6.3 Cache-Conscious Data Placement
6.4 Detailed Methodology
6.5 Experimental Evaluation
6.6 Related Work
6.7 Chapter Summary
7. Conclusion
7.1 Thesis Summary
7.2 Future Directions
Appendix A. The SimpleScalar Architecture
Appendix B. Detailed Results
--
%% Todd Austin, taustin@ichips.intel.com
%% MicroComputer Research Labs, Intel Corporation, Hillsboro, Oregon
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.