Re: Opcode handler dispatch in an interpreter: Implementing switch on OpCode.

bartc <>
Sat, 7 Oct 2017 19:05:10 +0100

          From comp.compilers

Related articles
| List of all articles for this month |

From: bartc <>
Newsgroups: comp.compilers
Date: Sat, 7 Oct 2017 19:05:10 +0100
References: 17-10-001 17-10-004 17-10-009
Injection-Info:; posting-host=""; logging-data="2500"; mail-complaints-to=""
Keywords: code, optimize
Posted-Date: 07 Oct 2017 14:37:42 EDT
Content-Language: en-GB

On 07/10/2017 17:55, George Neuner wrote:
>> Am 05.10.2017 um 18:29 schrieb Robert Jacobson:
>>> I am trying to wrap my mind around the issue of dynamic dispatch in the
>>> context of switching on opcode in a bytecode interpreter ...

> Unfortunately, on modern CPUs, *all* branches are predicted. An
> indirect jump through the table will mispredict virtually every time.
> The same will be true of an indirect jump via register based address.
> The best you can do with an interpreter is to have all the code in L1
> code cache. As soon as you have to go to L2 (which typically is
> shared between code and data) or deeper, you risk taking large hits if
> the code is not resident.
> comp.lang.asm.x86 has seen extensive discussions of mispredict
> problems in interpreters and JIT compiled code. The conclusions there
> are applicable to most CPU architectures.

I tried an experiment a few years ago, where the byte-code for a test
program was expanded into a sequence of instructions in a statically
compiled language (or it might have been done at run-time; I can't
remember). Each byte-code was represented by some mix of function
call, or some inline code.

This would have been expected to benefit by eliminating dispatch
overheads (it just steps into the next lot of code like an ordinary
program), and also by having dedicated code for some things that were
otherwise a parameter in a generic byte-code instruction.

But in fact the results weren't that great at all; most normal
dispatchers were actually faster!

Perhaps it was affected by having to fill up the instruction cache
with 1000 copies of the same PUSH code sequence, instead of re-using
the same single copy when running the byte-code dispatcher.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.