Re: Need Information on how to create bytecode

"Satyam" <>
12 Apr 2006 22:44:00 -0400

          From comp.compilers

Related articles
Need Information on how to create bytecode (megavlad@gmail) (2006-04-08)
Re: Need Information on how to create bytecode (=?ISO-8859-1?Q?J=FCrgen_Kahrs?=) (2006-04-09)
Need Information on how to create bytecode (Oliver Hunt) (2006-04-09)
Re: Need Information on how to create bytecode (2006-04-09)
Re: Need Information on how to create bytecode (Satyam) (2006-04-12)
Re: Need Information on how to create bytecode (2006-04-12)
Re: Need Information on how to create bytecode (DavidM) (2006-04-14)
Re: Need Information on how to create bytecode (megavlad@gmail) (2006-04-17)
Re: Need Information on how to create bytecode (kov) (2006-04-21)
Re: Need Information on how to create bytecode (2006-04-25)
| List of all articles for this month |

From: "Satyam" <>
Newsgroups: comp.compilers
Date: 12 Apr 2006 22:44:00 -0400
Organization: Compilers Central
References: 06-04-048 06-04-058
Keywords: interpreter, VM, design
Posted-Date: 12 Apr 2006 22:44:00 EDT

----- Original Message -----
From: "Jürgen Kahrs" <>
Sent: Sunday, April 09, 2006 10:59 PM
Subject: Re: Need Information on how to create bytecode


> [I suspect he was searching for info on bytecode design principles
> rather than on the spec for various existing bytecodes. It's a good
> question. We all know that the two main models are stack and
> pseudo-registers, but beyond that, for issues like how you encode
> constants and addresses and calls and interroutine links, as far as I
> know it's all folklore. -John]

The bytecodes of early processors came from the hardware designers.
Most often, sections of the bytecode would directly go to act on
specific decoders and gates in the circuit. Current microprocessors,
having many translation stages before acting on any hardware that
actually executes the instruction (bytecodes are entry points into a
translation table that gives the start address of the microprogram
that actually guides the processing) can be arbitrary and in Intel
80x86 family of backward compatible processors, they are chosen as to
keep that compatibility. Thus, even in actual hardware bytecodes, you
might as well assign them whichever way you want..

At most, the only precaution I would take, if 16 bits of bytecode is
not enough for you instruction set (which would be strange), is that
you assign single word bytecodes to the instructions used most often
and use prefixes for seldom used instructions. Sixteen bits, though,
is often quite enough so you might pack some extra information into a
bytecode, for example, to access a word at a certain offset from the
current stack frame, you might have two instructions, one which allows
you to specify a short offset in the least few significant bytes of
the instruction and another with a full extra word to specify the
offset. Since most function calls have few parameters, the short
offset bytecode would be used very often.

Same thing with jumps, specially conditional jumps. If the address to
be jumped is a short offset from the current pointer, see if you can
assign some of the bits in the bytecode to that offset. Since these
short addresses will be used far more frequently than longer ones,
packing that extra info into the bytecode will shorten the generated
bytecode and if it were to be executed on real hardware, it would
speed up execution as well. Actually, some processors have only short
conditional jumps (subroutine calls fall into this as well) and
arbitrary length unconditional jumps, if you want to conditionally
jump far away, you first have to make a short jump to the long
unconditional jump.

How many bits would you assign to those offsets? Well, do a first
draft of your bytecodes, see how many you have and then see how many
bits you can spare for offsets. And since you would probably have the
bytecodes used as index to a jump table in your interpreter, make the
bytecodes containing offsets consecutive and at the end so you know
that if you read a bytecode higher than whatever you got, you don't
use the regular jump table but first mask out those offsets, shift it
right and use what's left as the index to a separate table. Like:
111111xxcccooooo where 1's are ones, xx can be load, store, jump or
call, ccc would be the condition for the jump and ooooo would be the
offset or, in case of load and store, all of cccooooo would be the

Jumps and calls also have an easily recognizable prefix to signal
lookahead hardware to stop bothering looking ahead sequentially since
what's after won't get executed.

Few of this is applicable to virtual machines though still, shorter
programs tend to be faster anyway.


[The design tradeoffs for hardware vs. software architectures are
different. In hardware, you can pipline decoding and execution so you
typically want to make instructions simple and all take about the same
time to keep the pipeline simple, and you want to limit the number of
different opcodes to limit the amount of execution hardware. In
software, the decode is a bottleneck while there's little penalty for
having lots of operations, so you want more complex instructions to
minimize the number of trips through the interpreter. -John]

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.