Re: Caller/Callee saved Registers

pardo@cs.washington.edu (David Keppel)
Fri, 25 Mar 1994 02:32:22 GMT

          From comp.compilers

Related articles
[11 earlier articles]
Re: Caller/Callee saved Registers pdp8@ai.mit.edu (1994-03-24)
Re: Caller/Callee saved Registers ghiya@flo.cs.mcgill.ca (1994-03-24)
Re: Caller/Callee saved Registers paulb@travis.csd.harris.com (1994-03-24)
Re: Caller/Callee saved Registers hbaker@netcom.com (1994-03-24)
Re: Caller/Callee saved Registers bart@cs.uoregon.edu (1994-03-25)
Re: Caller/Callee saved Registers hbaker@netcom.com (1994-03-25)
Re: Caller/Callee saved Registers pardo@cs.washington.edu (1994-03-25)
Re: Caller/Callee saved Registers zsh@cs.princeton.edu (1994-03-25)
Re: Caller/Callee saved Registers law@fast.cs.utah.edu (1994-03-26)
Re: Caller/Callee saved Registers hbaker@netcom.com (1994-03-26)
Re: Caller/Callee saved Registers hbaker@netcom.com (1994-03-26)
Re: Caller/Callee saved Registers hbaker@netcom.com (1994-03-26)
Re: Caller/Callee saved Registers anton@mips.complang.tuwien.ac.at (1994-03-28)
[12 later articles]
| List of all articles for this month |
Newsgroups: comp.compilers
From: pardo@cs.washington.edu (David Keppel)
Keywords: registers, design, bibliography
Organization: Computer Science & Engineering, U. of Washington, Seattle
References: 94-03-054 94-03-105
Date: Fri, 25 Mar 1994 02:32:22 GMT

alk@et.msc.edu (Anthony L. Kimball) writes:
>[Caller "in use" mask, callee "in use" mask, hardware to save just
> the regs really in use. Anything like this?]


See (I think):


%A Huguet
%T Architectural and Compiler Support for Efficient Function Calls
%R Ph.D dissertation
%I University of California Los Angeles
%D 1989


As I recall, this discusses "in use" masks and also proposes and advocates
some other hardware-based approaches.




Note that with the caller mask/callee mask/special hardware approach, if
you have 32 registers and use a usual caller/callee split you never save
more than 16 cycles (the caller might needlessly save 16 live registers
but the callee will only save registers once it's used all the caller save
registers).


In practice, existing software conventions tend to do better than the
16-cycle worst case since most callers have some dead registers (and thus
do not need to save all 16 caller-save registers) and most callees use
some registers (so not all of the caller's saves are wasted).


In addition, software approaches for interprocedural analysis can
generally do a pretty good job and will do so at compile-time rather than
at run time. On the other hand, they do increase compile time and can't
perform as extensive optimization. See:


@CONFERENCE{Wall86,
      AUTHOR = "David W. Wall",
      TITLE = "Global Register Allocation at Link Time",
      BOOKTITLE = "Proceedings of the SIGPLAN'86 Symposium
                                on Compiler Construction",
      ADDRESS = "New York",
      PAGES = {264--275},
      YEAR = 1986}


and


%A Santhanam
%A Odnert
%T Register Allocation ACross Procedure and Module Boundaries
%J ACM SIGPLAN '90 conference on Programming Language Design and
Implementation (PLDI '90)
%D June 1990
%P 28-39


As I recall, Wall's paper indicates that there was more savings from
allocating registers to global variables than there was to eliminating
redundant spills and restores, on a machine with 56 general-purpose
integer registers. Santhanam and Odnert's paper reports similar results
for a machin with 30 general-purpose integer registers.


Note that there may be some call sites where you cannot apply
interprocedrual analysis -- but if you can get most of the common call
sites you can get most of the benefits.


It might appear that the hardware approaches win more in the case of e.g.,
C++ programs, which have a higher percentage of indirect (virtual)
function calls. Dirk Grunwald (grunwald@cs.colorado.edu) has done some
studies of C++ programs and optimizations and it appears some similar
results can be had in the presence of virtual functions. (I think there's
a writeup; I'll ask him to post here about it.)




That's not all to say that hardware heroics are a lose, but there are a
variety of reasons why they're not so much a win as you might think.
There's also the issues about how does the extra hardware affect cycle
time and how else could you use the real estate.


As an aside, the KSR-1 hardware supports `load multiple' and `store
multiple' instructions. The assembler, however, no longer recognizes
them. I don't know why they were removed from the supported instruction
set, but it does tend to reinforce my feeling that it's sometimes hard to
get multicycle memory operations right, esp. `load multiple', which can
overwrite the register being loaded...


;-D on ( The lose cannon ) Pardo
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.