Re: Branch prediction

anton@mips.complang.tuwien.ac.at (Anton Ertl)
21 May 2000 22:28:02 -0400

          From comp.compilers

Related articles
Re: Branch prediction bonzini@gnu.org (2000-05-20)
Re: Branch prediction djimenez@cs.utexas.edu (2000-05-21)
Re: Branch prediction anton@mips.complang.tuwien.ac.at (2000-05-21)
Re: Branch prediction freitag@alancoxonachip.com (Andi Kleen) (2000-05-21)
Re: Branch prediction sci0627@ccrd200.cdc.polimi.it (2000-05-28)
Re: Branch prediction anton@mips.complang.tuwien.ac.at (2000-05-31)
Re: Branch prediction qed@pobox.com (2000-06-03)
Re: Branch prediction rkrayhawk@aol.com (2000-06-20)
| List of all articles for this month |
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.compilers
Date: 21 May 2000 22:28:02 -0400
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
References: 00-05-073
Keywords: architecture, optimize



  bonzini@gnu.org writes:
>There's a fundamental problem with every branch prediction system I
>ran over. They do not work with indirect jump. Most processors
>remember where the last jump landed, which might be decent for a lexer
>but absolutely impractical for a bytecode interpreter;


Dave Gregg and I have recently written a paper on this topic; you can
find the current draft on
http://www.complang.tuwien.ac.at/anton/tmp/interpreter-arch.ps.gz.


Several processors now have brantch target buffers (BTBs) and similar
mechanisms (e.g., the next line predictor of the 21264) which predict
for each indirect branch that it will jump where it jumped last
(modulo conflict misses and "two-bit" counters).


In virtual machine (VM) interpreters BTBs have only 0%-20% prediction
accuracy if the interpreter uses a central dispatch routine, but they
give about 50% prediction accuracy if every VM instruction has its own
dispatch routine. I believe this can be improved even more by
combining common sequences of VM instructions into one VM instruction
(this would reduce the number of cases where one VM instruction occurs
several times in an inner loop, with different next instructions).


There are also designs for indirect branch predictors that give very
good accuracy (90%-100%) for the interpreters we benchmarked.


> the solution might be to tell the processor "hey, trust me, the jump
> is going to land there" with a special instruction.


The Power, PPC, HP Playdoh and IA64 architectures have this (in the
form of branch registers), and from my experience with the PPC (where
the special instructions are called mtctr and mtlr) it helps, but
there are two drawbacks:


1) Because these are special-purpose registers, GCC is not very good
at handling them, and often puts the mtctr instruction later than I
would like it (and GCC is the compiler of choice if you want to do
efficient VM interpreters, because of its labels-as-values feature).


2) Moreover, even with the optimum placement you cannot move the mtctr
instruction up above the start of the current VM instruction into the
last VM instruction because of the antidependence between the bctr
instruction of the last VM instruction and the mtctr instruction of
the current VM instruction. Given the five-cycle latency between
mtctr and bctr on the PPC 604e, this means that a VM instruction takes
at least five cycles, even if the payload would just take three cycles
(e.g., an add for a stack-based VM). Rotating branch registers would
help here, but are even less likely to be supported well by gcc.


So, overall, going for better indirect branch predictors and trying to
improve the prediction accuracy of BTBs through software improvements
looks more practical to me than architectural changes.


- anton
--
M. Anton Ertl Some things have to be seen to be believed
anton@mips.complang.tuwien.ac.at Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.