|Re: Branch prediction email@example.com (2000-05-20)|
|Re: Branch prediction firstname.lastname@example.org (2000-05-21)|
|Re: Branch prediction email@example.com (2000-05-21)|
|Re: Branch prediction firstname.lastname@example.org (Andi Kleen) (2000-05-21)|
|Re: Branch prediction email@example.com (2000-05-28)|
|Re: Branch prediction firstname.lastname@example.org (2000-05-31)|
|Re: Inline caching (was Re: Branch prediction) email@example.com (2000-06-01)|
|Re: Branch prediction firstname.lastname@example.org (2000-06-03)|
|Re: Branch prediction email@example.com (2000-06-20)|
|From:||firstname.lastname@example.org (Anton Ertl)|
|Date:||31 May 2000 23:08:44 -0400|
|Organization:||Institut fuer Computersprachen, Technische Universitaet Wien|
>>In virtual machine (VM) interpreters BTBs have only 0%-20% prediction
>>accuracy if the interpreter uses a central dispatch routine, but they
>>give about 50% prediction accuracy if every VM instruction has its own
>This is possible with GCC's label as values. Another good reason to use
Yes, that's the most portable way a user can ensure that this happens
(Fortran's computed GOTO is another way, but IMO Fortran is less
portable than GNU C, and I believe GNU C has other advantages when
>There is an interesting aspect I just ran after with BTB. If you
>implement inline caches with an indirect jump (instead of patching the
[I assume this is about OO method dispatch]
What does that look like? Isn't this just ordinary OO dispatch?
> you have no penalty because the jump through the inline cache is
>always predicted correctly (by the very definition of inline caching).
Modulo conflict and capacity misses.
>>I believe this can be improved even more by combining common sequences of
>>VM instructions into one VM instruction
>An easier way is to combine similar adjacent bytecodes into a single
>routine. For example (I use a switch statement syntax here):
> case 0: case 1: ... pushOOP(instanceVariable(*ip++ & 15));
That's contrary to my suggestion; I suggested creating more instances
of the dispatch code, the method above would combine several
dispatches into one. The disadvantage is: if you have several of
these VM instructions in an inner loop or somesuch, there will
probably be different next instructions, and the BTB will perform
badly for these dispatches.
>But check out decode penalties!
Yes, not only does it require additional decoding overhead, it is also
incompatible with the use of threaded code.
M. Anton Ertl Some things have to be seen to be believed
email@example.com Most things have to be believed to be seen
Return to the
Search the comp.compilers archives again.