Re: Mixing virtual and real machine code in an interpreter

nickh@harlequin.co.uk (Nick Haines)
Tue, 22 Mar 1994 10:27:33 GMT

          From comp.compilers

Related articles
Mixing virtual and real machine code in an interpreter graham@pact.srf.ac.uk (1994-03-16)
Re: Mixing virtual and real machine code in an interpreter sastdr@unx.sas.com (1994-03-21)
Re: Mixing virtual and real machine code in an interpreter pardo@cs.washington.edu (1994-03-22)
Re: Mixing virtual and real machine code in an interpreter nickh@harlequin.co.uk (1994-03-22)
Re: Mixing virtual and real machine code in an interpreter sdm7g@elvis.med.virginia.edu (Steven D. Majewski) (1994-03-23)
I-cache consistancy (WAS: Mixing virtual and real machine ...) pardo@cs.washington.edu (1994-03-24)
Re: Mixing virtual and real machine code in an interpreter sosic@kurango.cit.gu.edu.au (1994-03-30)
| List of all articles for this month |

Newsgroups: comp.compilers
From: nickh@harlequin.co.uk (Nick Haines)
Keywords: interpreter, design
Organization: Harlequin Limited, Cambridge, England
References: 94-03-039
Date: Tue, 22 Mar 1994 10:27:33 GMT

graham@pact.srf.ac.uk (Graham Matthews) writes:


      a) how difficult it is to mix and match such sections of real and
      virtual machine code? For example how do you access the
      interpreter's state from inside the real machine code section? Also
      on machines with different data and instruction space, how can you
      invoke the real machine instructions, which will be in data space?


      [...]


      [Mixing real and virtual code is a standard technique. Typically you
      call functions indirectly and jump either to the actual code or to an
      interpreter entry point. Real code rarely looks inside the interpreter,
      but rather shares application data. Re instruction and data spaces, you
      need some operating system hack to let you write into instruction segments.
      -John]


Our esteemed moderator has it right, this is a standard technique in Lisp,
Smalltalk, and many other languages. Smalltalk systems are traditionally
bytecoded (technical term for "in virtual machine code"), and some of the
better systems (e.g. UMass) do dynamic compilation to native code (thus
"mixing real and virtual machine code"). Lisp systems are interpretive but
provide a 'compile' function which will compile any given function into
machine code (or, I suppose, into bytecode).


A typical technique is to have function objects which include a slot for
machine code and a slot for bytecode (or the uncompiled source, in Lisp
systems). Then the function call code always enters the machine code with
a pointer to the function object in some specific register.


This is called "closure passing", and is generally a useful technique, as
other slots in the function object can be used for values in the static
closure of the function, and thus readily accessed during execution of the
function. Call the specific register the "closure register".


Functions which have been compiled to machine code execute
straight-forwardly. Functions which have not been so fully compiled have
a pointer to the interpreter code in their 'machine code' slot; the
interpreter code gets at the bytecode through the function object pointer.


So an interpreted function object looks like this:


  ------------
|machine code| -------> interpreter code:
|------------|
|bytecode | load bytecode, closure+4
|------------| <interpret bytecode>
|closure 1 |
|------------|
|closure 2 |
|------------|
|.... |
  ------------


And a compiled function object looks like this (note that one can choose
to compile a function at runtime, just by updating the 'machine code'
slot):


  ------------
|machine code| -------> function's machine code
|------------|
|bytecode |
|------------|
|closure 1 |
|------------|
|closure 2 |
|------------|
|.... |
  ------------


The calling convention is:


<get function object into register "closure">
load code_address, closure+0
jump-and-link code_address, link_register


Hope this is clear.


On the subject of separate instruction and data space, very few machines
suffer from such an unfortunate distinction. Unix machines define text and
data segments, but these are often just distinctions of convenience, and
one can execute code in the data segment (sometimes the text segment is
write-protected). Machines which do not allow this require a fancy OS hack.


On a related point, many machines have separate instruction and data
caches, but such machines generally provide an OS routine to invalidate
the instruction cache (or some part thereof); used when creating (during
compilation) or moving (during GC) code objects.


Nick
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.