Re: x86 Instruction Scheduling

qed@pobox.com (Paul Hsieh)
20 Jun 2003 00:16:29 -0400

From comp.compilers

Related articles
x86 Instruction Scheduling nicolas_capens@hotmail.com (2003-06-03)
Re: x86 Instruction Scheduling vmakarov@redhat.com (Vladimir Makarov) (2003-06-05)
*Re: x86 Instruction Scheduling qed@pobox.com* (2003-06-20)**

| List of all articles for this month |

From:	qed@pobox.com (Paul Hsieh)
Newsgroups:	comp.compilers
Date:	20 Jun 2003 00:16:29 -0400
Organization:	http://groups.google.com/
References:	03-06-020 03-06-050
Keywords:	386, optimize
Posted-Date:	20 Jun 2003 00:16:29 EDT

Vladimir Makarov <vmakarov@redhat.com> wrote:
> c0d1f1ed wrote:
> > I have a project swShader (http://sw-shader.sourceforge.net) which
> > could benefit a lot from instruction scheduling. Basically it's a
> > compiler for DirectX 9 pixel shader instructions. Every shader
> > instruction gets substituted by MMX/SSE instructions, and register
> > allocation is handled automatically. Because of this substitution,
> > dependent instructions are close together, causing inefficiencies. The
> > ps 2.0 specifications do not have jumps or calls, so I often have more
> > than a hundred instructions that could be scheduled as one block.
> >
> > So what kind of fast heuristics should 'always work'? I'm also looking
> > for more than one heuristic, or a very configurable one so the optimal
> > can be manually selected after benchmarking so it's more or less
> > system independent. Because of the out-of-order execution and many
> > unknown parameters I think it's hard to solve this perfectly, but what
> > is considered a good method for MMX/SSE instruction sheduling?

See: http://www.pobox.com/~qed/sched.zip . I wrote it up yesterday for
another purpose, but I am sure it will apply just as relevantly for
your problem. It assumes an idealized infinite-way super-scalar CPU,
however, the results should still be quite effective even on real
world practical processors, which are really only between 1-way and
3-way. The code should be fairly simple to understand, and can be
used experimentally as is since its just a text->text utility thus
far.

To be really effective, however, it would make sense for you to build
some pattern recognizing kinds of optimizations, between cycle groups.
  For example:

        paddb mm2, mm1
        paddb mm1, mm3
        paddb mm0, mm1
        paddb mm0, mm2

The last two instructions can (and should) be switched because "paddb"
is a commutative operation, but my algorithm does not know this, and
thus cannot perform the switch.

> It is still important to have insn scheduling for out-of-order
> speculative execution processors for big basic blocks and loops
> because of constraints of processor register renaming buffer and
> because constraints of number of branches which the processor goes
> trough for speculative execution.

Well, this depends on the processor. P3's and P4's have a "register
alloc" stage, which exposes a clear bottleneck, however, I believe
that the P4 has enough rename register to never be blocked at this
stage (I dont think that the P3 has enough to guarantee no stalls.)
The Athlon/Opteron has an "implicit" rename register allocation
strategy, so that its guaranteed to never run out.

The *real* benefit to software rescheduling is to more explicitely
expose the core bottlenecks in your code. Its often hard to see the
critical path without reordering the actual instructions anyway.

> Optimal solution of insn scheduling is NP-complete problem.

I'm sure it is, if you take into consideration the real resource
limitations (finite number of pipelines, specific ALU usage
constrains, scheduling limitations, etc.) Idealize the processor and
the problem is somewhat easier (certainly its no worse than n^3.)

> [...] Therefore
> heuristic algorithms are used. So usually two major used heuristics
> are critical path length or issuing as many insns as possible on a
> processor cycle (see Steven Muchnick's book).

The scheduler I wrote in the URL above only uses the "issue as many
insns as possible" heuristic. "Critical Path" is not that well
defined because it can change once the instructions are re-ordered or
other optimizations are performed.

--
Paul Hsieh
http://bstring.sourceforge.net/
http://www.pobox.com/~qed/

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: x86 Instruction Scheduling

qed@pobox.com (Paul Hsieh)20 Jun 2003 00:16:29 -0400

qed@pobox.com (Paul Hsieh)
20 Jun 2003 00:16:29 -0400