Re: Branch prediction

djimenez@cs.utexas.edu (Daniel A. Jimenez)
21 May 2000 22:26:09 -0400

          From comp.compilers

Related articles
Re: Branch prediction bonzini@gnu.org (2000-05-20)
Re: Branch prediction djimenez@cs.utexas.edu (2000-05-21)
Re: Branch prediction anton@mips.complang.tuwien.ac.at (2000-05-21)
Re: Branch prediction freitag@alancoxonachip.com (Andi Kleen) (2000-05-21)
Re: Branch prediction sci0627@ccrd200.cdc.polimi.it (2000-05-28)
Re: Branch prediction anton@mips.complang.tuwien.ac.at (2000-05-31)
Re: Branch prediction qed@pobox.com (2000-06-03)
Re: Branch prediction rkrayhawk@aol.com (2000-06-20)
| List of all articles for this month |

From: djimenez@cs.utexas.edu (Daniel A. Jimenez)
Newsgroups: comp.compilers
Date: 21 May 2000 22:26:09 -0400
Organization: CS Dept, University of Texas at Austin
References: 00-05-073
Keywords: architecture, optimize

>The scheme used by P6s and on (maybe Pentium MMX too? I don't
>remember) is exceptionally smart. It works by bucketing jumps into 16
>bins, according to a taken/not-taken bit pattern. E.g. bin 13 (1101)
>means Taken, Taken, Not-Taken, Taken; it is actually able to guess all
>pattern with periods up to six, a few longer patterns (up to 16), and
>even to learn irregularities.


This is a pretty common scheme. The Alpha 21264 uses an ever cooler
branch prediction mechanism: a hybrid branch predictor. Three tables
of two-bit saturating counters are index; one by the 10-bit per-branch
history of a branch, another by the 12-bit global history (i.e., the
history of all branches), and a third that keeps track of which
predictor has been more accurate recently. When a prediction is
needed, the third table is accessed and one of the two other tables is
consulted for a prediction, depending on which one has been more
accurate. When the true outcome becomes known, the first two tables
are updated, i.e., the counters are incremented if the branch was
taken, decremented otherwise. The third table is updated if one of
the other two was wrong. This yields very good accuracy, something
above 95% on SPEC95. In general, predictors that can do "branch
classification" by either dynamically or statically choosing which
predictor works best for which branch get very good accuracy.


>There's a fundamental problem with every branch prediction system I
>ran over. They do not work with indirect jump. Most processors
>remember where the last jump landed, which might be decent for a lexer
>but absolutely impractical for a bytecode interpreter; the solution
>might be to tell the processor "hey, trust me, the jump is going to
>land there" with a special instruction.


Branch target buffers (BTBs) are like set-associative caches for
predicting the targets of branches, including indirect branches (like
jumps through tables of pointers, virtual method calls, etc.). There
has been a lot of work done in both conditional and indirect branch
prediction, and the best ideas usually end up in some
microarchitecture or another. I've heard that there has been so much
work in branch prediction that the research community is getting sick
of it. Check out the conference proceedings for the last five or six
MICRO, ISCA and ASPLOS conferences; there are many papers on the
subject.


Obligatory compilers angle: More and more ISAs are exposing branch
prediction information to the compiler. IA64 has branches that allow
the compiler to specify a static prediction or a hint to the dynamic
predictor. Papers have proposed letting the compiler choose the best
history length (i.e. for the bit vector that indexes the table) for a
branch, the best predictor for a given branch (e.g. per-branch or
global, or using a variety of mechanisms), and even the best hash
function for indexing the branch prediction tables. For some of these
schemes, people are working on static analyses and heuristics to find
out this information at compile-time.


--
Daniel Jimenez djimenez@cs.utexas.edu


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.