Related articles |
---|
[3 earlier articles] |
Re: virtual machine efficiency vbdis@aol.com (2004-12-30) |
Re: virtual machine efficiency cr88192@hotmail.com (cr88192) (2004-12-30) |
Re: virtual machine efficiency cfc@shell01.TheWorld.com (Chris F Clark) (2004-12-30) |
Re: virtual machine efficiency lars@bearnip.com (2004-12-30) |
Re: virtual machine efficiency calumg@no.onetel.spam.com (Calum Grant) (2004-12-30) |
Re: virtual machine efficiency cr88192@hotmail.com (cr88192) (2004-12-31) |
Re: virtual machine efficiency cr88192@hotmail.com (cr88192) (2004-12-31) |
Re: virtual machine efficiency strohm@airmail.net (John R. Strohm) (2005-01-01) |
Re: virtual machine efficiency kers@hpl.hp.com (Chris Dollin) (2005-01-12) |
Re: virtual machine efficiency cr88192@hotmail.com (cr88192) (2005-01-14) |
Re: virtual machine efficiency kers@hpl.hp.com (Chris Dollin) (2005-01-15) |
Re: virtual machine efficiency hannah@schlund.de (2005-01-30) |
From: | "cr88192" <cr88192@hotmail.com> |
Newsgroups: | comp.compilers,comp.lang.misc |
Date: | 31 Dec 2004 13:07:09 -0500 |
Organization: | Compilers Central |
References: | 04-12-151 04-12-166 |
Keywords: | VM |
Posted-Date: | 31 Dec 2004 13:07:09 EST |
"Chris F Clark" <cfc@shell01.TheWorld.com> wrote in message
> ed_davis2 asked:
>> I have developed a simple stack based virtual machine. I would like
>> it to be as efficient as possible, in terms of both speed and object
>> code size. Size, as in size of the object code the VM processes
>> (byte-code), and not necessarily the size of the VM.
>
> You'll always have to make some tradeoffs between size and space.
> However, you probably can do quite well, if you note that a lot of
> operands are repeated. So, are a lot of opcodes and code sequences,
> but we'll concentrate on the operands first.
>
> For example, there are probably on the order of 255 common operands in
> a code sequence, so use something akin to sliding window data
> compression to capture that fact. Here is one possibility:
>
> struct operation {
> char opcode;
> char operand;
> };
>
> Use the operand values 0-254 to refer to the most recent 255 operands
> (as appearing in the code. Use operand value 255 to indicate a "new"
> operand being takens from a list of word aligned locations. When you
> fetch a new operand, retire the oldest operand from the most-recently
> used list. You also increment the pointer into the operand list, so
> that the next "new" operand fetches a different element from the list.
>
> address operand_mru[255]; // 0-254
> address new_operand_list[]; // 255 copies 1 from here into the mrs
>
> There are lots of heuristics you can build into the 255 most recent
> operand memory. You can think of it as a cross between a register
> file and a cache.
yes, this sounds pretty cool really, especially if one's oprands may be
complex or allocated types.
> For example, you can preload the cache with the operand list. This
> saves the 1st 255 uses of the "new" operand. You can also load/store
> the cache on a call/return by call return basis. Note, you probably
> don't want to actually copy the cache on the call/return, but instead
> keep a cac he pointer. This is a place where profiling will
> pay-off. Measure a couple of different strategies and keep measuring
> as one tunes.
>
> Next, since most routines do not have 255 local variables (unless
> you've done extensive inlining), you can consider making the cache
> smaller, and using some of the operand values to represent function
> parameters (carving out 16 or even 32 function parameters out of the
> cache won't significant impact the cache, and will handle almost any
> calls).
>
> You can even compress more, say your profiling suggests that you can
> get away with 24 cache entries, 7 paramete locations, and 1 "new"
> operand, that fits in 5 bits. Then, if can find 7 common opcodes (3
> more bits(, you can pack the entire opcode and operand into 1 byte.
>
> You can also compress the operation spcae the same way by reserving,
> some number of opcodes as "compresses" opcodes, that do a table lookup
> into an instruction cache. If the instrcution cache can hold
> sequences of instructions, you are close to Ziv-Lempel eocnding.
also cool imo.
could make sense for saving space, just I feel a little unsure of the
overhead or usefulness in many cases.
> Back to operand encoding, it is worth noting that the entries in the
> "address" mrs et. al. don't have to be just simple pointers. You can
> re-invent the concepts of indirect and indexed addressing (and even
> auto-increments if it suits you) in your virtual machine. RISC
> hardware architectures tended to drop these concepts. However, the
> tradeoffs in a hardware implementation are different than a software
> one. Again, measure and tune.
well, I tend to use a large number of compound instructions myself...
> Of course, the farther you push this. the farther you get from a
> simple VM.
my vm is still fairly simple, and specialized for a very specific
task. at least it is a little more general than my previous one, but
my previous one didn't even really use bytecode...
> Going the ooposite direction. One can borrow a concept that hardware
> designers have used, aligning instructions with no-ops. If you want
> your essential instructions to be half-word (or byte) aligned, but you
> want your insructions with 32 bit operands to have the operands
> algined on 32 bit boundaries. First, determine what opcode
> length/position would put the operand on a 32 bit boundary. For
> example, if you decide to have byte boundary instruction with a 1 byte
> opcode field, putting an instruction on the 3rd byte should then align
> the following opcode field. If you wnat to generate such an
> instruction and you aren't on the appropriate boundary, you can either
> insert no-op instructions to get to ehte deisred alginment, or pull
> instructions from further in the instruction list to try to fill the
> space. Note, that doing so, is potentially difficult, and whole
> phases are written to "schedule" instructions to try to minimize
> various penalties. Now many of those penlties don't apply, since your
> VM is likely to execute each instruction in serial, rather than doing
> multi-issue instructions that hardware can do.
yes...
> However, as more hardare gets to be multi-issue and out-of-order, the
> things which are cheap and parallel in hardware become more and more
> applicable to software. Not long ago, I would not have recommended
> using an 8 bit opcode and a 24 bit address in a VM, because the mask
> instruction to get a 32 bit pointer out of the 24 bit address, might
> have been expensive. These days I would recommend measuring the
> effect before making the decision. I suspect these days on any P 4 or
> better class machine, the mask is free. Of course, on the same class
> of machine I bet the mis-alignment penalty is nil also. The real
> measure is the amount of work one needs to do, especially in terms of
> memory traffic and most importantly non-cache memory traffic.
err, I sort of doubt masking and misalignment are "free" per-se, but they
are not that expensive. afaik, even on older processors masking has been
pretty cheap as well.
yeah, keeping the working set small is an issue, especially with a
dynamically typed language that tends to tear through a lot of heap, but
then again, at this point I don't think cache misses are the problem...
at least in some very specific cases my interpreter can run in constant
memory, which is something at least...
Return to the
comp.compilers page.
Search the
comp.compilers archives again.