|Made compiler/now how to make interpreter faster? firstname.lastname@example.org (1997-12-10)|
|Re: Made compiler/now how to make interpreter faster? email@example.com (1997-12-12)|
|Re: Made compiler/now how to make interpreter faster? firstname.lastname@example.org (1997-12-12)|
|Re: Made compiler/now how to make interpreter faster? email@example.com (David L Moore) (1997-12-12)|
|Re: Made compiler/now how to make interpreter faster? firstname.lastname@example.org (1997-12-12)|
|From:||David L Moore <email@example.com>|
|Date:||12 Dec 1997 14:47:20 -0500|
> Compared to plain C code the interpreter is much much
> slower, about 50-100x for a repeated loop, which seems pretty poor.
Interpreters always are significantly slower than compilers running
the same code on the same machine. A factor of 10 is typical. Also,
interpreters rarely contain optimizers and, depending upon your loop,
induction variable elimination, invariant hoisting, instruction
sheduling, loop rotation, and various hoisting and sinking
optimizations can certainly make a factor of five difference.
So, depending upon what you are measuring against, 50 times slower
could be quite reasonable!
John's suggestion about profiling is a good one. I have macros in the
code for the interpreter I support which calculates the time taken in
each instruction. This makes the results easier to work with than
using a system profiling tool.
Make sure the variables you access a lot, such as the stack pointer
and the interpreter pc, are local to the interpreter, not global. That
way they will be kept in registers by simple compilers. You can nudge
the compiler with a "register" declaration if this is C, but it
probably won't make any difference.
Also, you may find it useful to use the addresses of the code that
executes each interpreter instruction as the opcodes so you can simply
jump to the code for the next instruction after each instruction
rather than going around a while/case loop.
There should probably be a paragraph here about tiling the code for
your interpreter so that groups of instructions that occur together
have their code in the same I-cache line. This is non-trivial and
results in yet another application for graph partitioning heuristics,
which is an interesting topic in itself. I have not attempted to do
this. If you do, please measure how much improvement you get as a
result of this optimization alone and publish your results in Sigplan
- or perhaps you could get it accepted for PLDI. Code tiling has been
done in compilers, so there should be some literature on the general
[The suggestion in the penultimate paragraph is called threaded code,
and works pretty well. There was a long discussion a few years ago on
how to do it in more-or-less portable C. -John]
Return to the
Search the comp.compilers archives again.