|From:||George Neuner <firstname.lastname@example.org>|
|Date:||Sun, 08 Oct 2017 14:36:32 -0400|
|Organization:||A noiseless patient Spider|
|References:||17-10-001 17-10-004 17-10-009 17-10-011|
|Injection-Info:||gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="59176"; mail-complaints-to="email@example.com"|
|Keywords:||code, performance, architecture|
|Posted-Date:||08 Oct 2017 14:42:59 EDT|
On Sat, 7 Oct 2017 19:05:10 +0100, bartc <firstname.lastname@example.org> wrote:
>On 07/10/2017 17:55, George Neuner wrote:
>> comp.lang.asm.x86 has seen extensive discussions of mispredict
>> problems in interpreters and JIT compiled code. The conclusions there
>> are applicable to most CPU architectures.
>I tried an experiment a few years ago, where the byte-code for a test
>program was expanded into a sequence of instructions in a statically
>compiled language (or it might have been done at run-time; I can't
>remember). Each byte-code was represented by some mix of function
>call, or some inline code.
>This would have been expected to benefit by eliminating dispatch
>overheads (it just steps into the next lot of code like an ordinary
>program), and also by having dedicated code for some things that were
>otherwise a parameter in a generic byte-code instruction.
>But in fact the results weren't that great at all; most normal
>dispatchers were actually faster!
>Perhaps it was affected by having to fill up the instruction cache
>with 1000 copies of the same PUSH code sequence, instead of re-using
>the same single copy when running the byte-code dispatcher.
Some of the things to come out of the discussions in c.l.a.x86 were
that the fastest [simple] JIT sequence for an interpreter is a list of
explicit call instructions: e.g.,
rather than a list of function addresses with a dispatcher. This is
because the call leverages the CPU branch predictor rather than
fighting with it.
Also, if you use an address list coding, tail threading[*] the
interpreter is ususally a win. Threading is better than a central
dispatcher in cases where "instruction" sequences are reused often in
(roughly) the same order ... as in loops.
[*] in tail threading, instead of returning to a central dispatcher,
each function ends with a bit of dispatch logic that jumps directly to
the next function.
Return to the
Search the comp.compilers archives again.