Re: Branch prediction

anton@mips.complang.tuwien.ac.at (Anton Ertl)
31 May 2000 23:08:44 -0400

          From comp.compilers

Related articles
Re: Branch prediction bonzini@gnu.org (2000-05-20)
Re: Branch prediction djimenez@cs.utexas.edu (2000-05-21)
Re: Branch prediction anton@mips.complang.tuwien.ac.at (2000-05-21)
Re: Branch prediction freitag@alancoxonachip.com (Andi Kleen) (2000-05-21)
Re: Branch prediction sci0627@ccrd200.cdc.polimi.it (2000-05-28)
Re: Branch prediction anton@mips.complang.tuwien.ac.at (2000-05-31)
Re: Inline caching (was Re: Branch prediction) bonzini@my-deja.com (2000-06-01)
Re: Branch prediction qed@pobox.com (2000-06-03)
Re: Branch prediction rkrayhawk@aol.com (2000-06-20)
| List of all articles for this month |
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.compilers
Date: 31 May 2000 23:08:44 -0400
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
References: 00-05-103
Keywords: architecture, performance

  sci0627@ccrd200.cdc.polimi.it writes:
>>In virtual machine (VM) interpreters BTBs have only 0%-20% prediction
>>accuracy if the interpreter uses a central dispatch routine, but they
>>give about 50% prediction accuracy if every VM instruction has its own
>>dispatch routine.
>
>This is possible with GCC's label as values. Another good reason to use
>them.


Yes, that's the most portable way a user can ensure that this happens
(Fortran's computed GOTO is another way, but IMO Fortran is less
portable than GNU C, and I believe GNU C has other advantages when
implementing integers).


>There is an interesting aspect I just ran after with BTB. If you
>implement inline caches with an indirect jump (instead of patching the
>code)


[I assume this is about OO method dispatch]
What does that look like? Isn't this just ordinary OO dispatch?


> you have no penalty because the jump through the inline cache is
>always predicted correctly (by the very definition of inline caching).


Modulo conflict and capacity misses.


>>I believe this can be improved even more by combining common sequences of
>>VM instructions into one VM instruction
>
>An easier way is to combine similar adjacent bytecodes into a single
>routine. For example (I use a switch statement syntax here):
>
> case 0: case 1: ... pushOOP(instanceVariable(*ip++ & 15));


That's contrary to my suggestion; I suggested creating more instances
of the dispatch code, the method above would combine several
dispatches into one. The disadvantage is: if you have several of
these VM instructions in an inner loop or somesuch, there will
probably be different next instructions, and the BTB will perform
badly for these dispatches.


>But check out decode penalties!


Yes, not only does it require additional decoding overhead, it is also
incompatible with the use of threaded code.


- anton
--
M. Anton Ertl Some things have to be seen to be believed
anton@mips.complang.tuwien.ac.at Most things have to be believed to be seen


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.