x86 register allocation

Matt Postiff <postiffm@umich.edu>
28 Mar 1999 17:01:44 -0500

From comp.compilers

Related articles
x86 register allocation dplass@yahoo.com (1999-03-23)
Re: x86 register allocation dwight@pentasoft.com (1999-03-28)
Re: x86 register allocation rsherry@home.com (Robert Sherry) (1999-03-28)
*x86 register allocation postiffm@umich.edu (Matt Postiff)* (1999-03-28)**
Re: x86 register allocation bcombee@metrowerks.com (1999-03-28)
Re: x86 register allocation sander@haldjas.folklore.ee (Sander Vesik) (1999-03-28)
Re: x86 register allocation cbrtjr@ix.netcom.com (Charles E. Bortle, Jr.) (1999-03-28)

| List of all articles for this month |

From:	Matt Postiff <postiffm@umich.edu>
Newsgroups:	comp.compilers
Date:	28 Mar 1999 17:01:44 -0500
Organization:	University of Michigan EECS
References:	99-03-080
Keywords:	386, registers

This is a very long-winded reply to dplass@yahoo.com about x86
register allocation. I have excerpted it from a document describing
the backend of the MIRV IA32 compiler. MIRV is a compiler being
developed at the University of Michigan by David Greene, Dave Helder,
Charles Lefurgy, myself, and prof. Trevor Mudge.

In short, register allocation by graph coloring can be done on the
x86, at least on the integer side. It is important to have a
coalescing pass in the allocator to handle machine idiosyncrasies
(this is described below). Gcc has some code that does
stack-allocation for the floating point side; we just punt on this and
leave floats in memory.

I recommend the thesis by Briggs [14]. It covers a lot of the stuff
here, although in less detail in places since his work assumed a RISCy
machine. The MIRV backend does not currently implement
rematerialization and live range splitting, which are important
optimizations, particularly when spill code is important, as in the
x86.

Matt Postiff
postiffm@umich.edu

--- Begin Long Excerpt ---

1. The MIRV Low-level IR (MLIR) is a quad representation of the
function which assumes an infinite number of symbolic registers. Each
quad contains an operator, a destination, and two source
operands. This MLIR is structured around a basic block
representation. The control flow instructions refer to labels which
are names for basic blocks.

2. Simplify the MLIR. This task, called "lowering", simplifies each
quad to the point that it can be executed directly on the target
architecture. The resulting IR is called the Low Level IR (LLIR). An
important part of this step is the handling of machine-dependencies
such as instructions which require specific registers. Instructions
are inserted into the IR which copy the symbolic register to or from a
machine register. The later coalescing phase of the register allocator
attempts to remove as many of these copies as possible by directly
allocating the symbolic register into the particular machine
register. If no such allocation can be made, the copy remains in the
code. In this way, sources and destinations of odd instructions on the
x86 such as block copy (esi/edi), divide and modulo (eax/edx), shift
(ecx) and return values (eax) are allocated to the proper registers.

3. Register Allocation. Map the infinite symbolic register set into the
limited register set of the target architecture.

The MIRV register allocator performs the following steps. Explanations are
simplified for the sake of brevity and we assume the reader has some
familiarity with register allocation terminology [12, 13, 14].

1. Build the control flow graph (predecessor and successor information)

2. Perform variable definition/use analysis and iterative live variable
analysis

3. Build the interference graph. The MIRV register allocator is modeled
after a standard Chaitin/Briggs register allocator [12, 13, 14]. The
allocator assigns the symbolic registers to machine registers. Since in
general there are more symbolic registers than machine registers, the
allocator algorithm performs a packing algorithm which in this case is
cast as a graph coloring problem. The nodes of the graph represent machine
and symbolic registers and the edges represent interferences between the
nodes. The interferences indicate when two values cannot be assigned to
the same register. This is determined by examining the live variable
analysis results and determining what variables are simultaneously live.

4. Coalesce nodes of interference graph. Coalescing has several important
features: it removes unnecessary copy instructions; it precolors certain
symbolic registers; it allows easy handling of idiosyncratic register
assignments for special instructions and for the calling convention; and
it allows the compiler to seamlessly handle two-operand architectures such
as the x86. Coalescing will be described in more detail later.

5. Repeat steps 2 through 4 until no more coalescing can be performed.

6. Compute spill cost of each node. Variables which are used more
frequently (as determined by a static count) are given higher spill costs.
The spill cost determines which variables are selected first for spilling
during the insertion of spill code.

7. Attempt to color graph. As in [14] our allocator speculatively assumes
that all nodes can be colored and only when a node is proved not to be
colorable is it spilled. Assign a color to every node that can be safely
colored.

8. If the graph is not colorable, insert spill code. All registers which
were not assigned a color by step 7 are spilled.

9. Repeat steps 2 through 8 until the interference graph is colorable.
Spilling will be described in a later subsection.

Coalescing

Register coalescing combines nodes of the interference graph before
coloring is attempted. It attempts to transform copy instructions of the
form assign sX -> sY and allocate symbolic registers sX and sY to the same
register. It can do this if sX and sY do not interfere (their live ranges
do not overlap). If so, the assignment can be removed, and all references
to sX are changed to refer to sY. We say that sX has been coalesced with
sY and the result register is sY. This results in the elimination of the
sX node from the interference graph which simplifies the later graph
coloring step. This optimization is really global copy propagation, since
it removes a copy instruction and propagates the result register to all
other use and definition sites. It is very effective since it runs late in
compilation.

Which of sX or sY we choose to keep as the result register is only a
matter of implementation, except when one of the sX or sY is a machine
register. Then the coalescer must keep the machine register because a
machine register cannot be removed from the interference graph. This is
useful, as was mentioned before, because it allows the earlier IR lowering
phase of the backend to insert fixup code which contains machine register
specifiers. This fixup code is executed without regard for how many copy
instructions are inserted or where they are inserted. The fixup code
simply ensures that if a specific register assignment is required by the
architecture, a copy instruction is inserted whose source is a symbolic
register and whose destination is a machine register (or vice versa).
Coalescing also assists in handling two-operand architectures such as the
Intel IA32, as will be explained below.

The example in Figure 4.4 demonstrates why these features of coalescing
are useful. In the C code, a left shift operation is requested. The IR
equivalent is shown in (b). On the IA32 architecture, the shift amount
must be placed into the %ecx machine register, so the lowering phase
inserts a copy of sB into %ecx (c). It also inserts a copy of sA to sD
because the IA32 only allows two operand instructions (a = b + c must be
implemented as two instructions: a = b and a += c). In the example, two
coalescing operations occur between steps (c) and (d). The first
coalescing operation occurs because sB is copied into %ecx and those two
registers do not interfere (b). So sB is coalesced into %ecx, effectively
pre-allocating sB before the graph coloring even runs. The second
coalescing operation occurs because sA and sD are known not to interfere
(from (a)). The lowering phase can insert as many copies as it needs to
put the code into a form that is acceptable to the target architecture,
and the register coalescer removes (propagates) as many of those copies as
it can by pre-allocating symbolic registers into other symbolic or machine
registers.

C Code IR Lowered IR Machine Code
a = 1; assign 1->sA assign 1 -> sA movl 1,%eax
b = 13; assign 13->sB assign 13 -> sB
... ... ... ...
d = a << b; sD <- sA << sB assign sB -> %ecx movl 13,%ecx
assign sA -> sD shll %ecx,%eax
<a and d don't <sB doesn't sD <<= %ecx
  interfere interfere with
  after this> %ecx>

(a) (b) (c) (d)

Figure 4.4. An example demonstrating the utility of register coalescing.

The pre-allocation feature of coalescing is particularly useful to handle
specific register assignments of the machine architecture. But the IA32
architecture also requires that operate instructions such as addition take
only two operands - one source and a destination which doubles as a
source. In other words, a = b + c is not valid for the IA32, but a = a + b
is valid. The naive solution to this problem is shown in Figure 4.5. Here,
the three-operate format a = b + c has to be broken down into two simpler
instructions - a = b and a = a + c. This is the technique used in [14]. We
call this process "inserting the two-operand fixup instructions."

C Code IR Two-operand Lowered IR
a = b + c sA = sB + sC sA = sB
sA = sA + sC

(a) (b) (c)

Figure 4.5. An example of a simple-minded approach to coalescing for a
two-operand architecture. Be careful with cases like a = b - a, where
the above transformation would overwrite the value of a.

There is one problem with the solution shown in Figure 4.5 where it
generates nonoptimal (but correctly functioning) code. The two-operand
simplification was performed in the IR lowering phase. However, the
best place to do the two-operand modification is in the coalescer and
not in the IR lowering phase. This is because the coalesecer has
information regarding interference of registers. This is convenient
because it allows the selective insertion of two-operand fixup
instructions. The MIRV backend uses the following steps to determine
if it can coalesce the operands of an instruction:

1. For a copy instruction of the form a = b, coalesce a and b if possible.
The remainder of the algorithm is checking for two-operand optimizations.

2. If the machine requires two operand instructions, check the destination
and first source operand. If they can be coalesced, do so. In this case,
we avoid inserting any fixup code for the two-operand machine.

3. Again, for a two-operand instructions, if step 2 did not work, examine
those instructions which are commutative. Check dest and source2. If they
can be coalesced, do so and swap source1 and source2.

4. If both steps 2 and 3 did not coalesce the operands, then no coalescing
can be performed, either because the operate is not commutative, or both
operands are simultaneously live with the destination. Only in this case
do we need to insert the two-operand fixup instructions.

If two-operand fixup instructions are inserted too early (in the
lowering phase), the opportunity to swap operands later and coalesce
them is lost forever. This produces less than optimal code. This is
demonstrated in Figure 4.7. Since the coalescer knows that a and c are
not simultaneously live, it can swap their positions in the add
instruction and allocate them to the same register and avoid any
two-operand fixup instructions. The IR lowering phase cannot know this
since it has not computed liveness information, and so must insert the
two-operand fixup. Since a and b do interfere, no coalescing is
possible in Figure 4.7(c).

C Code IR Two-operand fixup Two-operand
in lowering stage fixup in coalesce
stage

a = b + c sA = sB + sC sA = sB sC = sA + sC

<a and c do not sA = sA + sC <sA has been
  interfere, a allocated to
  and b do> sC's register>

(a) (b) (c) (d)

Figure 4.7. An example showing how inserting two-operand fixup code too
early is suboptimal.

References

[12] Gregory J. Chaitin, Marc A. Auslander, Ashok K. Chandra, John
Cocke, Martin E. Hopkins and Peter W. Markstein. Register Allocation
Via Coloring. Computer Languages, Vol. 6 No. 1, pp. 47-57. 1981.

[13] G. J. Chaitin. Register Allocation and Spilling via Graph
Coloring. Proc. SIGPLAN '82 Symp. Compiler Construction, pp. 98-105,
1982.

[14] Preston Briggs. Register Allocation via Graph Coloring.. Rice
University, Houston, Texas, USA Tech. Report. unknown month, 1992.

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

x86 register allocation

Matt Postiff <postiffm@umich.edu>28 Mar 1999 17:01:44 -0500

Matt Postiff <postiffm@umich.edu>
28 Mar 1999 17:01:44 -0500