Re: Branch prediction (Paul Hsieh)
3 Jun 2000 17:37:03 -0400

          From comp.compilers

Related articles
Re: Branch prediction (2000-05-20)
Re: Branch prediction (2000-05-21)
Re: Branch prediction (2000-05-21)
Re: Branch prediction (Andi Kleen) (2000-05-21)
Re: Branch prediction (2000-05-28)
Re: Branch prediction (2000-05-31)
Re: Branch prediction (2000-06-03)
Re: Branch prediction (2000-06-20)
| List of all articles for this month |

From: (Paul Hsieh)
Newsgroups: comp.compilers
Date: 3 Jun 2000 17:37:03 -0400
Organization: A Zillion Monkeys
References: 00-05-073 00-05-081
Keywords: architecture

> > >There are two sides to the branch prediction story: static
> > >(compile-time) prediction and dynamic prediction done by the
> > >prefetcher.
> >
> > The x86 does not allow AFAIK static prediction. The PPC does.
> Actually it does -- implicitely through block ordering. For example
> most x86 are documented as always assuming backward branches as
> taken.

The Athlon statically always predicts fall through. However, the
Athlon also has more prediction bits than any other processor that I
know of, which kind of lessens the effect of such static predictions.

In any event what I think he meant was that the PPC can perform a
programmable staticly predicted branch. Given the plethora of other
branching mechanisms internal to the short pipelined G3/G4 processors
(including branch removal) I am curious as to what strategy a compiler
could or should take to derive any kind of benefit from these static

> [...] Modern compilers do block reordering to handle that if
> necessary. Of course the reordering gets ugly pretty quickly, but it
> allows at least some control for critical paths.

Indeed, that is done. The Intel IA-64 compiler also does this, though
apparently in much more depth.

> > There's a fundamental problem with every branch prediction system I
> > ran over. They do not work with indirect jump. Most processors
> > remember where the last jump landed, which might be decent for a lexer
> > but absolutely impractical for a bytecode interpreter; the solution
> > might be to tell the processor "hey, trust me, the jump is going to
> > land there" with a special instruction.

The Athlon can track a simple pattern between two indirect branch
target addresses.

> A lot of architectures (like IIRC the PPC and IA64) use special branch
> registers to solve that problem. You load the target address into a
> jump register early and a sufficiently wide implementation could start
> to prefetch.

You are talking about a "link" register of course. Typically it is
hoped that the compiler can preload the link register as early as
possible, so that either the CPU will have time to "remove the branch"
(in the PPC case) or simply allow redirection as soon as possible.

If you truly need to perform variable indirect branching (that varies
a lot), then x86 CPUs are not going to do very well no matter what.
If the indirection pattern is short and cyclic, then it is preferable
to use a sequence of binary branches to zero in on the condition. The
predictors will eventually hone in on the pattern, and pipelined work
will not be lost.
Paul Hsieh

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.