Re: virtual machine efficiency

"cr88192" <cr88192@hotmail.com>
31 Dec 2004 13:07:09 -0500

          From comp.compilers

Related articles
[3 earlier articles]
Re: virtual machine efficiency vbdis@aol.com (2004-12-30)
Re: virtual machine efficiency cr88192@hotmail.com (cr88192) (2004-12-30)
Re: virtual machine efficiency cfc@shell01.TheWorld.com (Chris F Clark) (2004-12-30)
Re: virtual machine efficiency lars@bearnip.com (2004-12-30)
Re: virtual machine efficiency calumg@no.onetel.spam.com (Calum Grant) (2004-12-30)
Re: virtual machine efficiency cr88192@hotmail.com (cr88192) (2004-12-31)
Re: virtual machine efficiency cr88192@hotmail.com (cr88192) (2004-12-31)
Re: virtual machine efficiency strohm@airmail.net (John R. Strohm) (2005-01-01)
Re: virtual machine efficiency kers@hpl.hp.com (Chris Dollin) (2005-01-12)
Re: virtual machine efficiency cr88192@hotmail.com (cr88192) (2005-01-14)
Re: virtual machine efficiency kers@hpl.hp.com (Chris Dollin) (2005-01-15)
Re: virtual machine efficiency hannah@schlund.de (2005-01-30)
| List of all articles for this month |

From: "cr88192" <cr88192@hotmail.com>
Newsgroups: comp.compilers,comp.lang.misc
Date: 31 Dec 2004 13:07:09 -0500
Organization: Compilers Central
References: 04-12-151 04-12-166
Keywords: VM
Posted-Date: 31 Dec 2004 13:07:09 EST

"Chris F Clark" <cfc@shell01.TheWorld.com> wrote in message
> ed_davis2 asked:
>> I have developed a simple stack based virtual machine. I would like
>> it to be as efficient as possible, in terms of both speed and object
>> code size. Size, as in size of the object code the VM processes
>> (byte-code), and not necessarily the size of the VM.
>
> You'll always have to make some tradeoffs between size and space.
> However, you probably can do quite well, if you note that a lot of
> operands are repeated. So, are a lot of opcodes and code sequences,
> but we'll concentrate on the operands first.
>
> For example, there are probably on the order of 255 common operands in
> a code sequence, so use something akin to sliding window data
> compression to capture that fact. Here is one possibility:
>
> struct operation {
> char opcode;
> char operand;
> };
>
> Use the operand values 0-254 to refer to the most recent 255 operands
> (as appearing in the code. Use operand value 255 to indicate a "new"
> operand being takens from a list of word aligned locations. When you
> fetch a new operand, retire the oldest operand from the most-recently
> used list. You also increment the pointer into the operand list, so
> that the next "new" operand fetches a different element from the list.
>
> address operand_mru[255]; // 0-254
> address new_operand_list[]; // 255 copies 1 from here into the mrs
>
> There are lots of heuristics you can build into the 255 most recent
> operand memory. You can think of it as a cross between a register
> file and a cache.


yes, this sounds pretty cool really, especially if one's oprands may be
complex or allocated types.


> For example, you can preload the cache with the operand list. This
> saves the 1st 255 uses of the "new" operand. You can also load/store
> the cache on a call/return by call return basis. Note, you probably
> don't want to actually copy the cache on the call/return, but instead
> keep a cac he pointer. This is a place where profiling will
> pay-off. Measure a couple of different strategies and keep measuring
> as one tunes.
>
> Next, since most routines do not have 255 local variables (unless
> you've done extensive inlining), you can consider making the cache
> smaller, and using some of the operand values to represent function
> parameters (carving out 16 or even 32 function parameters out of the
> cache won't significant impact the cache, and will handle almost any
> calls).
>
> You can even compress more, say your profiling suggests that you can
> get away with 24 cache entries, 7 paramete locations, and 1 "new"
> operand, that fits in 5 bits. Then, if can find 7 common opcodes (3
> more bits(, you can pack the entire opcode and operand into 1 byte.
>
> You can also compress the operation spcae the same way by reserving,
> some number of opcodes as "compresses" opcodes, that do a table lookup
> into an instruction cache. If the instrcution cache can hold
> sequences of instructions, you are close to Ziv-Lempel eocnding.


also cool imo.


could make sense for saving space, just I feel a little unsure of the
overhead or usefulness in many cases.


> Back to operand encoding, it is worth noting that the entries in the
> "address" mrs et. al. don't have to be just simple pointers. You can
> re-invent the concepts of indirect and indexed addressing (and even
> auto-increments if it suits you) in your virtual machine. RISC
> hardware architectures tended to drop these concepts. However, the
> tradeoffs in a hardware implementation are different than a software
> one. Again, measure and tune.


well, I tend to use a large number of compound instructions myself...


> Of course, the farther you push this. the farther you get from a
> simple VM.


my vm is still fairly simple, and specialized for a very specific
task. at least it is a little more general than my previous one, but
my previous one didn't even really use bytecode...


> Going the ooposite direction. One can borrow a concept that hardware
> designers have used, aligning instructions with no-ops. If you want
> your essential instructions to be half-word (or byte) aligned, but you
> want your insructions with 32 bit operands to have the operands
> algined on 32 bit boundaries. First, determine what opcode
> length/position would put the operand on a 32 bit boundary. For
> example, if you decide to have byte boundary instruction with a 1 byte
> opcode field, putting an instruction on the 3rd byte should then align
> the following opcode field. If you wnat to generate such an
> instruction and you aren't on the appropriate boundary, you can either
> insert no-op instructions to get to ehte deisred alginment, or pull
> instructions from further in the instruction list to try to fill the
> space. Note, that doing so, is potentially difficult, and whole
> phases are written to "schedule" instructions to try to minimize
> various penalties. Now many of those penlties don't apply, since your
> VM is likely to execute each instruction in serial, rather than doing
> multi-issue instructions that hardware can do.


yes...


> However, as more hardare gets to be multi-issue and out-of-order, the
> things which are cheap and parallel in hardware become more and more
> applicable to software. Not long ago, I would not have recommended
> using an 8 bit opcode and a 24 bit address in a VM, because the mask
> instruction to get a 32 bit pointer out of the 24 bit address, might
> have been expensive. These days I would recommend measuring the
> effect before making the decision. I suspect these days on any P 4 or
> better class machine, the mask is free. Of course, on the same class
> of machine I bet the mis-alignment penalty is nil also. The real
> measure is the amount of work one needs to do, especially in terms of
> memory traffic and most importantly non-cache memory traffic.


err, I sort of doubt masking and misalignment are "free" per-se, but they
are not that expensive. afaik, even on older processors masking has been
pretty cheap as well.


yeah, keeping the working set small is an issue, especially with a
dynamically typed language that tends to tear through a lot of heap, but
then again, at this point I don't think cache misses are the problem...


at least in some very specific cases my interpreter can run in constant
memory, which is something at least...


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.