Re: Fast Interpreted Code

pardo@june.cs.washington.edu (David Keppel)
Tue, 23 Apr 91 02:06:21 GMT

          From comp.compilers

Related articles
Fast Interpreted Code ssinghani@viewlogic.com (1991-04-17)
Re: Fast Interpreted Code pardo@june.cs.washington.edu (1991-04-23)
| List of all articles for this month |
Newsgroups: comp.compilers
From: pardo@june.cs.washington.edu (David Keppel)
Keywords: interpreter, threaded code
Organization: Computer Science & Engineering, U. of Washington, Seattle
References: <3035@redstar.cs.qmw.ac.uk> <1991Mar31.180635.5944@cs.rochester.edu> <1991Apr22.003548.14803@iecc.cambridge.ma.us>
Date: Tue, 23 Apr 91 02:06:21 GMT

ssinghani@viewlogic.com (Sunder Singhani) writes:
>[Our threaded code isn't fast enough. What's faster?]


As far as I know, threaded code gives the fastest primitives/second
dispatch rate on a variety of architectures. The general techniques for
making things faster (that I know of!) are to change things so that the
dispatch rate can go down without changing the work that gets done (or use
hardware, but we'll ignore that for the moment.)


* Use a different v-machine instruction set


    The overhead of interpreting is almost nothing in generic PostScript
    imaging code because all the time is spent in non-interpretded
    primitives. If you can characterize your key operations (perhaps
    info in [Davidson & Fraser ??, Fraser, Myers & Wendt 84] can help
    you analyze for common operations instead of the more normal time in
    routines) then you can re-code your virtual instruction set to have
    as primintives oeprations that are performed frequently.


* Dynamic compilation to native machine code


    This is what is done in ParcPlace System's Smalltalk-80
    implementation, [Deutsch & Schiffman 84] also Insignia Solution's
    8086 interpreter.


    Dynamic compilation suffers from the need to do compilation at
    runtime: a compiler that produces better code will take longer to
    run and the compile time contributes to the overall program runtime.
    Also, program text isn't shared, even if multiple instances are
    running simultaneously.


* Native-coding key routines


    If you believe that your program spends 80% of its time in a few key
    routines, then compiling just these routines -- statically, adding
    them to the primitive set, statically adding them as library
    routines, or dynamically -- can improve performance substantially
    [Pittman 87].




5 Citations follow:


%A Robert Bedichek
%T Some Efficient Architecture Simulation Techniques
%J Winter '90 USENIX Conference
%D 26 October, 1989
%W Robert Bedichek.
%W Pardo has a copy.
%X Describes a simulator that uses threaded-code techniques to emulate
a Motorola 88000. Each 88k instruction is executed in about 20 host
(68020) instructions. Discusses techniques used to get the simulation
down from several thousand host instructions in many other
simulators.


%A Jack W. Davidson
%A Chris W. Fraser
%T Eliminating Redundant Object Code
%J POPL9
%P 128-132


%A Peter Deutsch
%A Alan M. Schiffman
%T Efficient Implementation of the Smalltalk-80 System
%J 11th Annual Symposium on Principles of Programming Languages
(POPL 11)
%D January 1984
%P 297-302
%X Dynamic translatin of p-code to n-code (native code).
Resons for not using straight p-code or straight n-code:
  * p-code is smaller than n-code (<= 5X).
  * The debugger can debug p-code, improving portability.
  * Native code is faster (<= 2X). Reasons include
special fetch/decode/dispatch hardware;
p-machine and n-machine may be very different, e.g.,
stack machine vs. register-oriented.
  * Threaded code does reduce the cost of p-code fetch/decode.
Does not help with operand decoding.
Does not allow optimizations to span more than one instruction.
[pardo: that's not technically true, but each optimized
instruction must exist in addition to the unoptimized version.
That leads to exponential blowup of the p-code. Example: delayed
branch and non-delayed branch versions of Robert Bedichek's 88k
simulator.]
  The system characteristics:
  * The time to translate to n-code via macro expansion is about the
same as the execute time to interpret the p-code.
  * (pg 300:) Self-modifying code (SMC) is deprecated but used in a
``well-confined'' way. Could indirect at more cost. Use SMC on the
machines where it works, indirection where SMC.
doesn't.
  * Performance is compared to a ``straightforward'' interpreter.
What's that?


%A Christopher W. Fraser
%A Eugene W. Myers
%A Alan L. Wendt
%T Analyzing and Compressing Assembly Code
%J CCC84
%P 117-121


%A Thomas Pittman
%T Two-Level Hybrid Interpreter/Native Code Execution for Combined
Space-Time Program Efficiency
%D 1987
%J ACM SIGPLAN
%P 150-152
%X Talks about native code execution vs. various kinds of interpreting
and encoding of key routines in assembly.




Hope this helps!


;-D on ( This is all open to interpretation ) Pardo
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.