Related articles |
---|
Re: Branch prediction bonzini@gnu.org (2000-05-20) |
Re: Branch prediction djimenez@cs.utexas.edu (2000-05-21) |
Re: Branch prediction anton@mips.complang.tuwien.ac.at (2000-05-21) |
Re: Branch prediction freitag@alancoxonachip.com (Andi Kleen) (2000-05-21) |
Re: Branch prediction sci0627@ccrd200.cdc.polimi.it (2000-05-28) |
Re: Branch prediction anton@mips.complang.tuwien.ac.at (2000-05-31) |
Re: Inline caching (was Re: Branch prediction) bonzini@my-deja.com (2000-06-01) |
Re: Branch prediction qed@pobox.com (2000-06-03) |
Re: Branch prediction rkrayhawk@aol.com (2000-06-20) |
From: | anton@mips.complang.tuwien.ac.at (Anton Ertl) |
Newsgroups: | comp.compilers |
Date: | 31 May 2000 23:08:44 -0400 |
Organization: | Institut fuer Computersprachen, Technische Universitaet Wien |
References: | 00-05-103 |
Keywords: | architecture, performance |
sci0627@ccrd200.cdc.polimi.it writes:
>>In virtual machine (VM) interpreters BTBs have only 0%-20% prediction
>>accuracy if the interpreter uses a central dispatch routine, but they
>>give about 50% prediction accuracy if every VM instruction has its own
>>dispatch routine.
>
>This is possible with GCC's label as values. Another good reason to use
>them.
Yes, that's the most portable way a user can ensure that this happens
(Fortran's computed GOTO is another way, but IMO Fortran is less
portable than GNU C, and I believe GNU C has other advantages when
implementing integers).
>There is an interesting aspect I just ran after with BTB. If you
>implement inline caches with an indirect jump (instead of patching the
>code)
[I assume this is about OO method dispatch]
What does that look like? Isn't this just ordinary OO dispatch?
> you have no penalty because the jump through the inline cache is
>always predicted correctly (by the very definition of inline caching).
Modulo conflict and capacity misses.
>>I believe this can be improved even more by combining common sequences of
>>VM instructions into one VM instruction
>
>An easier way is to combine similar adjacent bytecodes into a single
>routine. For example (I use a switch statement syntax here):
>
> case 0: case 1: ... pushOOP(instanceVariable(*ip++ & 15));
That's contrary to my suggestion; I suggested creating more instances
of the dispatch code, the method above would combine several
dispatches into one. The disadvantage is: if you have several of
these VM instructions in an inner loop or somesuch, there will
probably be different next instructions, and the BTB will perform
badly for these dispatches.
>But check out decode penalties!
Yes, not only does it require additional decoding overhead, it is also
incompatible with the use of threaded code.
- anton
--
M. Anton Ertl Some things have to be seen to be believed
anton@mips.complang.tuwien.ac.at Most things have to be believed to be seen
Return to the
comp.compilers page.
Search the
comp.compilers archives again.