I-cache performance tuning [summary]

krste@ICSI.Berkeley.EDU (Krste Asanovic)
Mon, 31 Oct 1994 22:24:15 GMT

          From comp.compilers

Related articles
I-cache performance tuning [summary] krste@ICSI.Berkeley.EDU (1994-10-31)
| List of all articles for this month |

Newsgroups: comp.arch,comp.compilers
From: krste@ICSI.Berkeley.EDU (Krste Asanovic)
Keywords: architecture, summary
Organization: International Computer Science Institute, Berkeley, CA, U.S.A.
Date: Mon, 31 Oct 1994 22:24:15 GMT

Here is a summary of the responses I received to my request for tools
and papers regarding instruction cache performance tuning.

Thanks to all the respondents for their help!

Krste Asanovic email: krste@icsi.berkeley.edu
International Computer Science Institute, phone: +1 (510) 642-4274 x143
Suite 600, 1947 Center Street, fax: +1 (510) 643-7684
Berkeley, CA 94704-1198, USA http://http.icsi.berkeley.edu

[Summary in Unix mail format]

>From mark@hubcap.clemson.edu Sun Oct 16 10:21:01 1994
Return-Path: <mark@hubcap.clemson.edu>
Date: Sun, 16 Oct 1994 13:19:23 -0400
From: Mark Smotherman <mark@hubcap.clemson.edu>
Subject: Re: I-cache performance tuning tools
Newsgroups: comp.arch,comp.compilers
References: 94-10-116

            I-Cache Optimizations -- Classifications

                P - profile-guided
                S - static

                C - calculate conflicts (for direct-mapped cache)
                D - duplicate code to align targets
                I - inline procedures
                L - link ordering using call-frequency cliques
                M - code motion to align targets
                N - nop-padding to align targets
                R - rearrange basic blocks
                S - segregate infrequent basic blocks

            paper type C D I L M N R S
            Agarwal-87 P C L
            Samples-88 P R
            Hwu-89 P I L R S
            McFarling-89 P C
            Gupta-90 S D M N R
            Pettis-90 P L R S
            McFarling-91 P I
            Wu-92 P L
            Mendlson-94 P C D R


            A. Agarwal, et al., "On-Chip Caches for High-Performance Proces-
            sors," Proc. 1987 Conf. on Adv. Research in VLSI, Stanford, March
            1987, pp. 1-24.

            A. Dain Samples and Paul N. Hilfinger, "Code Reorganization for
            Instruction Caches", UCB TR CSD-88-447, Oct. 1988.

            Wen-mei W. Hwu and Pohua P. Chang "Achieving High Instruction
            Cache Performance with an Optimizing Compiler", Proc. ISCA 89,
            Jerusalem, May-June 1989, pp. 242-251.

            Scott McFarling, "Program Optimization for Instruction Caches",
            Proc. ASPLOS III, Boston, April 1989, pp. 183-191.

            Rajiv Gupta and Chi-Hung Chi, "Improving Instruction Cache
            Behavior by Reducing Cache Pollution", Proc. Supercomputing 90,
            New York, November 1990, pp. 82-91.

            Karl Pettis and Robert C. Hansen, "Profile Guided Code Position-
            ing", Proc. SIGPLAN PLDI 90, White Plains, NY, June 1990, pp.

            Scott McFarling, "Procedure Merging with Instruction Caches",
            Proc. SIGPLAN PLDI 91, Toronto, June 1991, pp. 71-79.

            Youfeng Wu, "Ordering Functions for Improving Memory Reference
            Locality in a Shared Memory Multiprocessor System", Proc. Micro-
            25, Portland, December 1992, pp. 218-221.

            Abraham Mendlson, Shlomit S. Pinter, and Ruth Shtokhamer, "Com-
            pile Time Instruction Cache Optimizations", ACM Computer Arch.
            News, 22, 1, March 1994, pp. 44-51.

>From stravers@borneo.gmd.de Mon Oct 17 01:59:28 1994
Return-Path: <stravers@borneo.gmd.de>
Date: Mon, 17 Oct 1994 09:58:53 +0100
From: Paul Stravers <Paul.Stravers@gmd.de>
Subject: I-cache performance tuning tools
Reply-To: Paul Stravers <paul.stravers@gmd.de>

Wisconsin University made a cache profiler available that shows you how each
individual line of source code behaves in the cache. This still does not
automatically "correct" your program, of course.

Cheerio, Paul

From: david@cs.wisc.edu (David Wood)
Subject: CProf cache profiling system available
Date: Thu, 13 Oct 1994 15:49:34 GMT

The current issue (October 1994) of IEEE Computer contains a paper
describing the CProf cache profiling system and how it may be applied
to optimize performance of the SPEC benchmarks.

Alvin R. Lebeck and David A. Wood, "Cache Profiling and the
SPEC Benchmarks: A Case Study," IEEE Computer, October 1994.

Unfortunately, the article fails to mention that the CProf software is
available for distribution. If you are interested in obtaining CProf,
see below for details.

Alvy Lebeck
David Wood
Computer Sciences Dept.
University of Wisconsin
1210 W. Dayton Street
Madison, WI 53706

What is CProf?

The CProf system is a cache performance profiler that annotates source
listings to identify the source lines and data structures that cause
frequent cache misses. The CProf system contains a uniprocessor cache
simulator and an X windows user interface. CProf processes program
traces and annotates source lines and data structures with the
appropriate cache miss statistics. CProf provides a generalized X
windows interface for easy viewing of annotated source files.

CProf currently operates on traces produced by the QPT system (distributed
with CProf, see below). However, its modular design makes porting
to other tracing systems relatively simple.

Why is CProf useful?

The performance of current RISC processors is very sensitive to cache
miss ratios. Programmers, compiler writers, and language designers
must explicitly consider cache behavior to fully exploit a program's
performance potential. CProf provides detailed information about a
program's cache behavior through full cache simulation. By anno-
tating lines of source code and data structures with the corresponding
number of cache misses, CProf helps the user focus on problematic data
structures and algorithms. CProf aids the programmer in identifying
types of transformations that can improve program cache behavior by
classifying cache misses as: compulsory, capacity, or conflict. We
have profiled six of the SPEC92 benchmarks and obtained speedups in
execution time ranging from 1.03 to 1.92 on a DECstation 5000/240.

TABLE 2. Execution Time Speedup at Optimization Level -O3

Program DEC 5000/125 DEC 5000/200 DEC 5000/240
compress 1.56 1.30 1.92
dnasa7 1.30 1.24 1.51
eqntott 1.11 1.06 1.03
spice 1.26 1.25 1.34
tomcatv 1.47 1.28 1.60
xlisp 1.06 1.03 1.08

Where can I read more about CProf?

Alvin R. Lebeck and David A. Wood, "Cache Profiling and the SPEC
Benchmarks: A Case Study," IEEE Computer, October 1994, pp. 15-26.
An earlier version of this paper appeared as Technical Report No.
1164, University of Wisconsin, revised July 1993. (A postscript
version of the technical report is available via anonymous ftp
on romano.cs.wisc.edu in ~ftp/public/CPROF/cprof.paper.ps.Z)

How can I get CProf?

CProf is distributed under license as part of the WARTS distribution.
WARTS (Wisconsin Architectural Research Tool Set) is a collection of
tools for profiling and tracing programs and analyzing program
traces. WARTS currently contains:

o QPT, a program profiler and tracing system.

o CProf, a cache performance profiler.

o Tycho and dineroIII, cache simulators.

WARTS is distributed with the full source and a small amount of
documentation. The tools in WARTS are copyrighted and distributed under
license. A copy of the license is available on ftp.cs.wisc.edu in
~ftp/pub/warts-license.ps.Z or it can be obtained by writing to the
address below. WARTS is available without charge for university
researchers and is available to others for a modest research donation.
Contact warts@cs.wisc.edu for more details or to be added to the
mailing list to hear of future releases. Also, check out our web page
at http://www.cs.wisc.edu/~larus/warts.html.

Wisconsin Architectural Research Tools Set
Computer Science Dept.
University of Wisconsin
1210 W. Dayton Street
Madison, WI 53706

>From calder@frob.cs.colorado.edu Mon Oct 17 14:34:16 1994
Return-Path: <calder@frob.cs.colorado.edu>
Date: Mon, 17 Oct 1994 15:32:36 -0600
From: Brad Calder <calder@frob.cs.colorado.edu>
Subject: Re: I-cache performance tuning tools

We recently published a paper in ASPLOS 94 on how basic block
rearranging effects the performance of branch prediction. We used
similar algorithms to those that have been used by the IMPACT-I
compiler and the HP-RISC compiler to improve cache performance. In
implementing these techniques on a dual issue Alpha we found that up
to a 16% reduction in execution time is seen for Gcc. This
improvement comes from reducing the instruction cache misses along
with reducing the branch penalty costs.

If you are interested. The following is a PostScript version of the
paper and it references the studies that did procedure reordering and
basic block reordering to improve instruction cache performance.

-- Brad

>From calder@frob.cs.colorado.edu Mon Oct 17 16:46:09 1994
Return-Path: <calder@frob.cs.colorado.edu>
Date: Mon, 17 Oct 1994 17:46:04 -0600
From: Brad Calder <calder@frob.cs.colorado.edu>
Subject: Re: I-cache performance tuning tools
References: <199410172132.PAA12639@frob.cs.colorado.edu>

> Do you have an ftp address or URL for the paper? I'll be posting a
> summary of responses back to the net and would like to give a path
> where people can grab the paper.

Yes, the paper can be grabbed from my URL:


-- Brad

>From wchang@super.mti.sgi.com Tue Oct 18 09:48:41 1994
Return-Path: <wchang@super.mti.sgi.com>
Date: Tue, 18 Oct 94 09:48:30 -0700
From: wchang@super.mti.sgi.com (Wei-Chau Chang)
Subject: Re: I-cache performance tuning tools

Here's a list of references.

[PH90] Pettis and Hansen, "Profile Guided Code positioning", SIGPLAN,
    1990, pp. 16-27.

[MPS94] Mendelson, Pinter and Shtokhamer, "Compile-Time Instruction cache
    Optimizations" Computer Architecture News (March 1994)
    also in International Conference on Compiler Construction '94

[HC89] W. Hwu and Pohua Chang, "Achieving high instruction cache performance
    with an optimizing compiler" 16th ISCA, pp. 242-251.

[McF89] McFarling, "Program Optimization for Instruction Caches", ASPLOS,
    April 1989, pp. 183-11.

[McF91] McFarling, "Procedure Merging with Instruction Caches", SIGPLAN,
    1991, pp. 71-79.

[Hi88] Paul N. Hilfinger, "Code Reorganization for Instruction Caches",
    UCB TR EECS-??, Oct 1988.

[CB93] J. B. Chen and B. N. Bershad, "The Impact of Operating System
    Structure on Memory System Performance", Proceedings of the 14th ACM
    Symposium on Operating System Principles", December 1993, pp.120-133

Wei-Chau Chang wchang@mti.sgi.com
Silicon Graphics Inc.

>From heisch@austin.ibm.com Wed Oct 19 11:53:26 1994
Return-Path: <heisch@austin.ibm.com>
From: heisch@austin.ibm.com
Date: Wed, 19 Oct 1994 13:53:19 -0500
Subject: I-cache performance tuning tools

Krste, IBM has a tool - FDPR (Feedback Directed Program Restructuring) - which
does profile based program optimization. The tool is available on AIX 4.1.

There are many papers on this subject; I have appended a subset. I have a
paper publishing in the next issue of the IBM Journal of Research and
Development regarding the work leading up to the FDPR tool as well as a
writeup in the August issue of AIXpert specifically regarding the FDPR tool.

FDPR currently only supports AIX XCOFF unstripped executables.

Randy Heisch
Processor and System Performance
IBM Risc System/6000 Division
11400 Burnet Rd. Bldg. 904/9440
Austin, TX 78758-3493
(512) 838-8252

R.R. Heisch, "Trace-directed program restructuring for AIX executables",
IBM Journal of Research and Development 38, No.5, 595-603, September 1994.

Randall R. Heisch, "FDPR for AIX Executables", AIXpert, 16-20, August 1994.

Steven E. Speer, Rajiv Kumar and Craig Partridge, "Improving UNIX Kernel
Performance using Profile Based Optimization", 1994 Winter Usenix - January
17-21, 1994 - San Francisco, CA.

Pohua P. Chang, Scott A. Mahlke and Wen-Mei W. Hwu,"Using Profile
Information to Assist Classic Code Optimizations", Software-Practice and
Experience, Vol. 21(12), 1301-1321, December 1991

David W. Wall, "Predicting Program Behavior Using Real or Estimated
Profiles", Proceedings of the ACM SIGPLAN'91 Conference on Programming
Language Design and Implementation, Toronto, Ontario, Canada, June 26-28
1991, pages 59-70

Karl Pettis and Robert C. Hansen, "Profile Guided Code Positioning",
Proceedings of the ACM SIGPLAN'90 Conference on Programming Language Design
and Implementation. White Plains, New York, June 20-22, 1990, pages 16-27

Scott McFarling, "Program Optimization for Instruction Caches", Third
International Symposium on Architectural Support for Programming Languages
and Operating Systems, April 1989, pages 183-191

D. Ferrari, "The Improvement of Program Behavior", Computer, Vol 9, No.
11, Nov. 1976, pages 39-47

D.J. Hatfield and J. Gerald, "Program Restructuring for Virtual Memory",
IBM Systems Journal, No. 3, 1971, pages 168-192

------ From hoschka@pax.inria.fr (Philipp Hoschka )

In article <38385k$hp5@wagner.convex.com>, Patrick F. McGehearty <patrick@convex.COM> writes:

|> Someone from HP wrote an article on the topic of instruction cache
|> optimization two-three years ago, but I can't remember the exact reference.
|> Perhaps someone else will post it for us.

I guess you mean:

K Pettis and R.C. Hansen
"Profile Guided Code Positioning"
Proc. Sigplan on Progamming Language Design and Implementation
(SIGPLAN Notices), vol. 25, no 6, June 1990.

      Philipp Hoschka | INRIA-Rodeo
      Mosaic homepage:
      hoschka@sophia.inria.fr | 2004, Route des Lucioles, BP 93
      Ph: (+33) 93 65 77 47 | 06902 Sophia Antipolis Cedex
      Fax:(+33) 93 65 77 65 | France

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.