Related articles |
---|
virtual machine efficiency ed_davis2@yahoo.com (ed_davis2) (2004-12-29) |
Re: virtual machine efficiency dido@imperium.ph (Rafael 'Dido' Sevilla) (2004-12-30) |
Re: virtual machine efficiency anton@mips.complang.tuwien.ac.at (2004-12-30) |
Re: virtual machine efficiency vbdis@aol.com (2004-12-30) |
Re: virtual machine efficiency cr88192@hotmail.com (cr88192) (2004-12-30) |
Re: virtual machine efficiency cfc@shell01.TheWorld.com (Chris F Clark) (2004-12-30) |
Re: virtual machine efficiency lars@bearnip.com (2004-12-30) |
Re: virtual machine efficiency calumg@no.onetel.spam.com (Calum Grant) (2004-12-30) |
Re: virtual machine efficiency cr88192@hotmail.com (cr88192) (2004-12-31) |
[6 later articles] |
From: | anton@mips.complang.tuwien.ac.at (Anton Ertl) |
Newsgroups: | comp.compilers,comp.lang.misc |
Followup-To: | comp.compilers |
Date: | 30 Dec 2004 00:57:29 -0500 |
Organization: | Institut fuer Computersprachen, Technische Universitaet Wien |
References: | 04-12-151 |
Keywords: | VM, performance |
Posted-Date: | 30 Dec 2004 00:57:29 EST |
"ed_davis2" <ed_davis2@yahoo.com> writes:
>I have developed a simple stack based virtual machine. I would like
>it to be as efficient as possible, in terms of both speed and object
>code size. Size, as in size of the object code the VM processes
>(byte-code), and not necessarily the size of the VM.
>
>But I've run into one of those situations where I can't figure out how
>to have both.
You can't. The fastest techniques (read some of my papers) use more
memory than some other techniques. However, there are some techniques
that are good compromises between speed and space consumption; read:
@InProceedings{proebsting95,
author = "Todd A. Proebsting",
title = "Optimizing an {ANSI~C} Interpreter with Superoperators",
crossref = "popl95",
pages = "322--332",
annote = "Interpreter performance is optimized by combining
operators during code generation, when they are
still organized as trees. So a different, optimized
interpreter
is used for each program. Speedups of 1.8--3.1 are
achieved, but this is probably strongly dependent on
the original set of operators. The paper uses lccs
intermediate code operators \cite{fraser&hanson91a}."
}
@Proceedings{popl95,
booktitle = "Principles of Programming Languages (POPL '95)",
title = "Principles of Programming Languages (POPL '95)",
year = "1995",
key = "POPL '95"
}
@string{spe="Software---Practice and Experience"}
@Article{hoogerbrugge+99,
author = "Jan Hoogerbrugge and Lex Augusteijn and Jeroen Trum
and Rik van de Wiel",
title = "A code compression system based on pipelined
interpreters",
journal = spe,
volume = "29",
number = "11",
pages = "1005--1023",
month = sep,
year = "1999",
OPTannote= ""
}
@InProceedings{lattendresse&feeley03,
author = {Mario Latendresse and Marc Feeley},
title = {Generation of Fast Interpreters for {Huffman}
Compressed Bytecode},
booktitle = {Interpreters, Virtual Machines and Emulators
(IVME~'03)},
pages = {32--40},
year = {2003},
url1 = {http://www.complang.tuwien.ac.at/anton/ivme03/proceedings/latendresse.ps.gz},
url2 = {http://portal.acm.org/ft_gateway.cfm?id=858574&type=pdf},
abstract = {Embedded systems often have severe memory
constraints requiring careful encoding of
programs. For example, smart cards have on the order
of 1K of RAM, 16K of non-volatile memory, and 24K of
ROM. A virtual machine can be an effective approach
to obtain compact programs but instructions are
commonly encoded using one byte for the opcode and
multiple bytes for the operands, which can be
wasteful and thus limit the size of programs
runnable on embedded systems. Our approach uses
canonical Huffman codes to generate compact opcodes
with custom-sized operand fields and with a virtual
machine that directly executes this compact code. We
present techniques to automatically generate the new
instruction formats and the decoder. In effect, this
automatically creates both an instruction set for a
customized virtual machine and an implementation of
that machine. We demonstrate that, without prior
decompression, fast decoding of these virtual
compressed instructions is feasible. Through
experiments on Scheme and Java, we demonstrate the
speed of these decoders. Java benchmarks show an
average execution slowdown of 9%. Compression
factors highly depend on the original bytecode and
the training sample, but typically vary from 30% to
60%. }
}
>All of the opcodes are one byte in size. Given the instruction:
>
>push address
>
>'push' takes one byte, and the address takes 4 bytes. If I pack
>the code into a byte array, this takes 5 bytes. However, now
>that means that address isn't on a word boundary. If I load
>'address' using:
>
>unsigned char *code;
>
>operand = *(int *)code;
>
>I incur a speed hit on many processors, and it may even fail on
>others, since code probably isn't suitably aligned.
>
>But if I use:
>
>memcpy(&operand, code, sizeof(operand));
>
>or
>
>operand =
>(code[pc]) |
>(code[pc + 1] << 8) |
>(code[pc + 2] << 16) |
>(code[pc + 3] << 24);
>
>I don't incur the miss-aligned speed hit, but it is nowhere as
>fast as "operand = *(int *)code;" could be, if code were suitably
>aligned.
>
>I could make all opcodes a word in size, and make the code array an
>array of integers. But that would dramatically increase the size of
>the byte-code required by the VM.
Not necessarily. If you use superinstructions, you may have
relatively few VM instructions compared to operands; with dynamic
superinstructions you can get it down to one VM instruction per VM
branch target (but you need space for the dynamic superinstruction
code; I don't know if you consider that part of the VM, which is not
as sensitive for you).
>Is there a way I can have both fast loading of operands, and
>small byte-code size?
This particular problem has been addressed by Ogata et al. [ogata+02],
but I am not sure that I would use it if I could define my own
bytecode (they wanted to do JVM interpretation without rewriting the
JVM code).
Other techniques that come to mind:
- Use byte-sized operands where possible.
- Collect VM instructions and byte-sized operands within words, but
let word-sized operands start at the next word; this is somewhat like
the two-stream (sections) idea suggested by our esteemed moderator,
but reduces the overhead of dealing with two instruction pointers on
VM control flow. Example of such VM code (for a machine with 4-byte
words):
inst 1 | byte-operand 1 of inst 1 | inst 2 | inst 3
word-operand 1 of inst 1
word-operand 2 of inst 1
word-operand 1 of inst 2
word-operand 1 of inst 3
byte-operand 1 of inst 3 | inst 4 ...
...
Various methods for implementing such a scheme efficiently come to
mind, but I won't go into that now.
@InProceedings{ogata+02,
author = {Kazunori Ogata and Hideaki Komatsu and Toshio
Nakatani},
title = {Bytecode Fetch Optimization for a {Java}
Interpreter},
crossref = {asplos02},
pages = {58--67},
annote = {The paper presents a Java Bytecode interpreter for
the PowerPC architecture and some optimizations and
evaluates them: stack caching (a few variations),
position-based handler customization and
position-based speculative decoding (software
pipelining of the interpreter). Position-based
handler customization deals with different
alignments of bytecodes by having four states in the
interpreter for the different alignments, each state
with its own specialized copy of the
interpreter. For stack caching they evaluated a
fixed one-TOS-register organization with
write-through caching (5.6\% speedup over base), and
dynamic stack caching with two registers (3 states,
7/9% speedup over base), and used the write-through
organization for further experiments; write-through
is not compared empirically to
write-back. Position-based handler customization
buys another 19\%, and software pipelining an
additional 3.4\%. The paper also presents results on
memory traffic (both I and D).}
}
@Proceedings{asplos02,
title = "Architectural Support for Programming Languages and
Operating Systems (ASPLOS-X)",
booktitle = "Architectural Support for Programming Languages and
Operating Systems (ASPLOS-X)",
year = "2002",
key = "ASPLOS-X"
}
Followups set to comp.compilers.
- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/home.html
Return to the
comp.compilers page.
Search the
comp.compilers archives again.