Related articles |
---|
From: | George Neuner <gneuner2@comcast.net> |
Newsgroups: | comp.compilers |
Date: | Sun, 08 Oct 2017 14:36:32 -0400 |
Organization: | A noiseless patient Spider |
References: | 17-10-001 17-10-004 17-10-009 17-10-011 |
Injection-Info: | gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="59176"; mail-complaints-to="abuse@iecc.com" |
Keywords: | code, performance, architecture |
Posted-Date: | 08 Oct 2017 14:42:59 EDT |
On Sat, 7 Oct 2017 19:05:10 +0100, bartc <bc@freeuk.com> wrote:
>On 07/10/2017 17:55, George Neuner wrote:
>
>> comp.lang.asm.x86 has seen extensive discussions of mispredict
>> problems in interpreters and JIT compiled code. The conclusions there
>> are applicable to most CPU architectures.
>
>I tried an experiment a few years ago, where the byte-code for a test
>program was expanded into a sequence of instructions in a statically
>compiled language (or it might have been done at run-time; I can't
>remember). Each byte-code was represented by some mix of function
>call, or some inline code.
>
>This would have been expected to benefit by eliminating dispatch
>overheads (it just steps into the next lot of code like an ordinary
>program), and also by having dedicated code for some things that were
>otherwise a parameter in a generic byte-code instruction.
>
>But in fact the results weren't that great at all; most normal
>dispatchers were actually faster!
>
>Perhaps it was affected by having to fill up the instruction cache
>with 1000 copies of the same PUSH code sequence, instead of re-using
>the same single copy when running the byte-code dispatcher.
Possibly.
Some of the things to come out of the discussions in c.l.a.x86 were
that the fastest [simple] JIT sequence for an interpreter is a list of
explicit call instructions: e.g.,
call <bytecode_function>
call <bytecode_function>
:
rather than a list of function addresses with a dispatcher. This is
because the call leverages the CPU branch predictor rather than
fighting with it.
Also, if you use an address list coding, tail threading[*] the
interpreter is ususally a win. Threading is better than a central
dispatcher in cases where "instruction" sequences are reused often in
(roughly) the same order ... as in loops.
George
[*] in tail threading, instead of returning to a central dispatcher,
each function ends with a bit of dispatch logic that jumps directly to
the next function.
Return to the
comp.compilers page.
Search the
comp.compilers archives again.