Re: Opcode handler dispatch in an interpreter: Implementing switch on OpCode.

George Neuner <gneuner2@comcast.net>
Sun, 08 Oct 2017 14:36:32 -0400

          From comp.compilers

Related articles
| List of all articles for this month |

From: George Neuner <gneuner2@comcast.net>
Newsgroups: comp.compilers
Date: Sun, 08 Oct 2017 14:36:32 -0400
Organization: A noiseless patient Spider
References: 17-10-001 17-10-004 17-10-009 17-10-011
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="59176"; mail-complaints-to="abuse@iecc.com"
Keywords: code, performance, architecture
Posted-Date: 08 Oct 2017 14:42:59 EDT

On Sat, 7 Oct 2017 19:05:10 +0100, bartc <bc@freeuk.com> wrote:


>On 07/10/2017 17:55, George Neuner wrote:
>
>> comp.lang.asm.x86 has seen extensive discussions of mispredict
>> problems in interpreters and JIT compiled code. The conclusions there
>> are applicable to most CPU architectures.
>
>I tried an experiment a few years ago, where the byte-code for a test
>program was expanded into a sequence of instructions in a statically
>compiled language (or it might have been done at run-time; I can't
>remember). Each byte-code was represented by some mix of function
>call, or some inline code.
>
>This would have been expected to benefit by eliminating dispatch
>overheads (it just steps into the next lot of code like an ordinary
>program), and also by having dedicated code for some things that were
>otherwise a parameter in a generic byte-code instruction.
>
>But in fact the results weren't that great at all; most normal
>dispatchers were actually faster!
>
>Perhaps it was affected by having to fill up the instruction cache
>with 1000 copies of the same PUSH code sequence, instead of re-using
>the same single copy when running the byte-code dispatcher.


Possibly.


Some of the things to come out of the discussions in c.l.a.x86 were
that the fastest [simple] JIT sequence for an interpreter is a list of
explicit call instructions: e.g.,


      call <bytecode_function>
      call <bytecode_function>
          :


rather than a list of function addresses with a dispatcher. This is
because the call leverages the CPU branch predictor rather than
fighting with it.


Also, if you use an address list coding, tail threading[*] the
interpreter is ususally a win. Threading is better than a central
dispatcher in cases where "instruction" sequences are reused often in
(roughly) the same order ... as in loops.


George
[*] in tail threading, instead of returning to a central dispatcher,
each function ends with a bit of dispatch logic that jumps directly to
the next function.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.