|Object-Oriented Procedure Call (was: Smart I-cache?) email@example.com (1990-11-24)|
|From:||firstname.lastname@example.org (Eliot Moss)|
|Organization:||Dept of Comp and Info Sci, Univ of Mass (Amherst)|
|References:||<email@example.com.COM> <firstname.lastname@example.org> <email@example.com>|
|Date:||24 Nov 90 23:17:41 GMT|
Mention of architectural speedups via special branch target cache mechanisms,
and the fact that they do not apply to "object-oriented" procedure calls
because they are indirect, encouraged me to open up wider discussion of the
issues surrounding object-oriented procedure calls (OOPCs for short).
First, a review of some of the linguistic features. A number of languages can
compile a "dynamically bound" (OO) procedure call into an indirect branch,
i.e., a branch through a compile-time-known offset in a run-time-known table.
Other OO languages require lookup that makes obtaining a fixed,
compile-time-known offset difficult. Smalltalk, for example, requires run-time
method lookup in general.
On the other hand, it has been empirically demonstrated that any given call
site rarely varies in the class/type of the argument(s) determining the call
binding (i.e., which actual code is run). In 1984 (see POPL for that year)
Deutsch and Schiffman presented the *inline method cache*, where a direct call
is performed, and the prolog at the top of each method verifies that the class
of the receiver is the same as it was at the previous call at the same call
site. If the class is the same, then execution proceeds; if it is different,
then a lookup is performed (using a full, software managed cache, backed up by
an exhaustive search up the class hierarchy) and the code at the call site is
changed to refer to the appropriate method, and the class-last-time is changed
to reflect the appropriate binding. The calling sequence is something like
... return point ...
Note that the return is 4 bytes beyond the usual return address. Anyway, this
technique is very effective (very good hit rate), but does require code
modification. It appears it might work well with a properly designed smart
Another thing that I should mention is that some informal, unpublished
measurements that I heard about indicate that over 95% of call sites in a
statically typed OO language (Trellis/Owl) could be statically bound. That is,
there is only one *possible* target for the dynamically bound call at that
many call sites. Clearly this is an opportunity for compiler optimization! It
does require *global knowledge* of the program, though.
The inline method cache and similar techniques can bring the cost of OOPC
within an instruction or so of ordinary procedure call, even with hardware
speedups for statically known calls. The REAL loss of performance with an
object oriented style is losing the ability to inline the whole procedure
call! (IMHO.) In the long run, we will take a considerable performance hit if
we do not perform appropriate global analysis and optimization to recognize
and statically bind every call we can, and thus gain the ability to inline and
apply further optimizations. Interesting related ideas include method
splitting and customized compilation (see the Chambers and Ungar paper from
SIGPLAN 90, for example).
This is yet another case where we need experts from different specialties to
try to understand each others area to some extent, and cooperate to achieve
the best result.
Yours in OO performance ..... Eliot Moss
J. Eliot B. Moss, Assistant Professor
Department of Computer and Information Science
Lederle Graduate Research Center
University of Massachusetts
Amherst, MA 01003
(413) 545-4206; Moss@cs.umass.edu
Return to the
Search the comp.compilers archives again.