THESIS AVAILABLE: ... Reducing Load Latency

taustin@ichips.intel.com (Todd Austin)
15 Sep 1996 00:41:37 -0400

          From comp.compilers

Related articles
THESIS AVAILABLE: ... Reducing Load Latency taustin@ichips.intel.com (1996-09-15)
| List of all articles for this month |
From: taustin@ichips.intel.com (Todd Austin)
Newsgroups: comp.arch,comp.compilers
Date: 15 Sep 1996 00:41:37 -0400
Organization: Intel Corp.
Keywords: report, performance, architecture

Greetings, my Ph.D. thesis is now available from UW-Madison (UW CS TR-1311).
This work was performed in conjunction with my thesis advisor, Guri Sohi.
Umpteen years in the making, I hope you enjoy it. Comments and questions
welcomed. The compressed postscript for the thesis is in the file:


        ftp://ftp.cs.wisc.edu/sohi/austin.thesis.ps.Z


Details on the contents follow... Regards, -Todd


--


TITLE: Hardware and Software Mechanisms for Reducing Load Latency


BY: Todd M. Austin


ABSTRACT:


As processor demands quickly outpace memory, the performance of load
instructions becomes an increasingly critical component to good system
performance. This thesis contributes four novel load latency reduction
techniques, each targeting a different component of load latency: address
calculation, data cache access, address translation, and data cache misses.
The contributed techniques are as follows:


    - Fast Address Calculation employs a stateless set index predictor
                to allow address calculation to overlap with data cache access. The
                design eliminates the latency of address calculation for many loads.


    - Zero-Cycle Loads combine fast address calculation with an
                early-issue mechanism to produce pipeline designs capable of hiding the
                latency of many loads that hit in the data cache.


    - High-Bandwidth Address Translation develops address translation
                mechanisms with better latency and area characteristics than a
                multi-ported TLB. The new designs provide multiple-issue processors
                with effective alternatives for keeping address translation off the
                critical path of data cache access.


    - Cache-conscious Data Placement is a profile-guided data placement
                optimization for reducing the frequency of data cache misses. The
                approach employs heuristic algorithms to find variable placement
                solutions that decrease inter-variable conflict, and increase cache
                line utilization and block prefetch.


Detailed design descriptions and experimental evaluations are provided for each
approach, confirming the designs as cost-effective and practical solutions for
reducing load latency.




TABLE OF CONTENTS:


1. Introduction
        1.1 Anatomy of a Load
        1.2 The Impact of Load Latency
        1.3 Reducing the Impact of Load Latency
        1.4 Contributions of This Thesis
        1.5 Organization of This Thesis


2. Experimental Framework
        2.1 Compiler Tools
        2.2 Simulation Methodology
        2.3 Analyzed Programs


3. Fast Address Calculation
        3.1 Introduction
        3.2 Program Reference Behavior
        3.3 Fast Address Calculation
        3.4 Working Examples
        3.5 Increasing Prediction Performance with Software Support
        3.6 Experimental Evaluation
        3.7 Related Work
        3.8 Chapter Summary


4. Zero-Cycle Loads
        4.1 Introduction
        4.2 Zero-Cycle Loads
        4.3 A Working Example
        4.4 Experimental Evaluation
        4.5 Related Work
        4.6 Chapter Summary


5. High-Bandwidth Address Translation
        5.1 Introduction
        5.2 Impact of Address Translation on System Performance
        5.3 High-Bandwidth Address Translation
        5.4 Experimental Evaluation
        5.5 Chapter Summary


6. Cache-Conscious Data Placement
        6.1 Introduction
        6.2 How Variable Placement Affects Data Cache Performance
        6.3 Cache-Conscious Data Placement
        6.4 Detailed Methodology
        6.5 Experimental Evaluation
        6.6 Related Work
        6.7 Chapter Summary


7. Conclusion
        7.1 Thesis Summary
        7.2 Future Directions


Appendix A. The SimpleScalar Architecture
Appendix B. Detailed Results


--
%% Todd Austin, taustin@ichips.intel.com
%% MicroComputer Research Labs, Intel Corporation, Hillsboro, Oregon
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.