Re: What is byte-code ? (Anton Ertl)
31 Mar 2005 23:30:49 -0500

          From comp.compilers

Related articles
[9 earlier articles]
Re: What is byte-code ? (Carl Cerecke) (2005-03-08)
Re: What is byte-code ? (2005-03-15)
Re: What is byte-code ? (Chris Dollin) (2005-03-15)
Re: What is byte-code ? (Xous - Jose R. Negreira) (2005-03-18)
Re: What is byte-code ? (Nathan Moore) (2005-03-24)
Re: What is byte-code ? (2005-03-31)
Re: What is byte-code ? (2005-03-31)
Re: What is byte-code ? (Chris Dollin) (2005-03-31)
Re: What is byte-code ? (Nathan Moore) (2005-04-02)
Re: What is byte-code ? (John Slimick) (2005-04-11)
Re: What is byte-code ? (Chris Dollin) (2005-04-11)
Re: What is byte-code ? (2005-04-11)
Re: What is byte-code ? (Nathan Moore) (2005-04-16)
[3 later articles]
| List of all articles for this month |

From: (Anton Ertl)
Newsgroups: comp.compilers
Date: 31 Mar 2005 23:30:49 -0500
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
References: 05-03-015 05-03-037 05-03-043 05-03-047 05-03-050
Keywords: interpreter, VM
Posted-Date: 31 Mar 2005 23:30:49 EST (=?iso-8859-1?q?Torben_=C6gidius_Mogensen?=) writes:
> - Bytecodes designed for interpretation needs low decoding overhead,
> so you would tend to have a lot of "work" done per instruction.

My impression is that VMs designed with this idea in mind have many
special-purpose VM instructions and their interpreters tend to be slow
on general-purpose code. I have the following explanations for that:

- Since the idea is that the individual VM instructions do so much
work that the interpretive overhead does not matter much, the
implementer does not put as much effort into reducing the overhead per
instruction (after all, he needs to optimize all those important
special-purpose instructions, too).

- Some of the issues in interpreter efficiency also work out
differently: E.g., the fastest interpreters typically put all the code
for all VM instructions in one big C function, which is not considered
good style in general. In particular, if the VM consists of 200
simple VM instructions, each with 1-3 lines of code, and the speedup
is large, one is usually willing to follow this practice; but if the
VM consists of 500 mostly special-purpose VM instructions, each with
20-100 lines of code, and the speedup is small on many
(special-purpose) benchmarks, the implementor will probably not put
them in a single function.

- The special-purpose VM instructions may put some burden on the VM
that has to be paid by other instructions as well, e.g., state that
has to be maintained across function calls; or register pressure in
one place that causes spilling in other places.

- The VM designer may neglect the small general-purpose VM
instructions; they may be a special case of a more powerful (but
slower) VM instruction, and the VM designer may not want to add
another VM instruction just for this special case.

While the idea of doing more work per instruction has its merits, the
way it should be done is to first make a simple VM with a fast
interpreter, then see what sequences are executed often, and add VM
instructions for them.

> - Bytecodes designed for compilation would tend to use registers
> rather than a stack. An interpreter would have difficulty using
> registers for all but a few user variables, as it can't indirectly
> address registers. Hence, you often see stack-based bytecode for
> interpretation. A compiler has no such problem, as it can compile
> into direct register accesses.

Yes, an interpreter has to use memory for indexable registers.
However, as David Gregg and his students have found, using a
"register" VM (actually they used a variant of the JVM with directly
encoded locals) can reduce the number of VM instructions executed by
so much that every interpreter with a significant
VM-instruction-dispatch overhead is faster with such a "register" VM
than with a naive stack VM.

Of course, I still believe in stack VMs, for the following reasons:

- Dynamic superinstructions (aka selective inlining) eliminate the
differences in dispatch overhead between these two methods (there is
only one dispatch per VM basic block).

- Even without dynamic superinstructions, one can find sequences of
frequently occuring VM instructions and combine them into static VM
superinstructions; I believe that applying this optimization to both
stack VM and "register" VM will eliminate many differences between
them (e.g., if the sequence "iload iload iadd istore" is combined into
a superinstruction, the superinstruction is actually an instruction of
the "register" VM), and lets the stack VM have a similar low number of
dispatches, while still having less decoding overhead in those cases
where the stack can be used to advantage (e.g., non-trivial
expressions). There are automatic tools for introducing static
superinstructions (Vmgen), so sticking with the stack VM gives you
most of the benefit of the register VM without having to pay the
complexity of dealing with a register VM.

As for compilers, why would a VM designed for them use registers? In
order to use the CPU's registers effectively, the compiler has to
reallocate registers anyway; for that a stack VM is just as easy as a
register VM; maybe it's easier, since with stack items you learn about
their death during a forward pass through the code, whereas with most
register-based representations (certainly a typical register VM) you
need a backwards pass to determine the death of a register.

You could use an intermediate representation designed for register
allocation, e.g., with an infinite number of pseudo-registers and only
one live range per pseudo-register and with explicit register death
marks; but such an intermediate representation would typically no
longer be considered a VM.

- anton
M. Anton Ertl

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.