x86 global floating point register allocation

"Sverker Nilsson" <sverker.is@home.se>
24 Aug 2002 12:00:13 -0400

          From comp.compilers

Related articles
x86 global floating point register allocation sverker.is@home.se (Sverker Nilsson) (2002-08-24)
Re: x86 global floating point register allocation jacob@jacob.remcomp.fr (jacob navia) (2002-09-03)
Re: x86 global floating point register allocation jgd@cix.co.uk (John Dallman) (2002-09-03)
Re: x86 global floating point register allocation reig@tenerife.ics.uci.edu (Fermin Reig) (2002-09-03)
Re: x86 global floating point register allocation ceco@jupiter.com (Tzvetan Mikov) (2002-09-08)
| List of all articles for this month |

From: "Sverker Nilsson" <sverker.is@home.se>
Newsgroups: comp.compilers
Date: 24 Aug 2002 12:00:13 -0400
Organization: Cybecity Internet Aps, Denmark
Keywords: 386, arithmetic, code, question
Posted-Date: 24 Aug 2002 12:00:13 EDT


I am a writing a compiler, targeting the (Intel IA-32 etc.) x86
cpu(*). I would like to ask if someone can comment on how to handle
floating point registers that are live across basic blocks, and
possibly help me with references to existing work and standard
terminology. (I am going to try to get it approved for my master's
thesis and likely publish the code as open source.)

The special problem with the x86 floating point register stack is,
well, that it is like a stack (8 entries, with a wrap around stack
pointer) and not a general register file. Operations generally require
one operand to be the top of the stack. If there is another operand,
commonly it can be anywhere in the stack including the top, and the
destination can be freely chosen to be the operand on the stack or the
other one.

Thus sometimes operands need to be moved to the top of the stack to be
operated on. That is typically done by an exchange instruction. Not to
go into too much details here, it seems that when generating code for
a basic block, if we don't want to make excessive use of exchanges to
keep the registers content on specific places, it is better to keep
track of what 'virtual' register or 'color' is at what place, and just
make the exchanges that are actually necessary. (**)

At the end of the basic block, then, it seems the virtual registers
may have been swapped around arbitrarily in the register stack.

That would be no problem if no registers were live out from the basic
block. If there are any such registers live, they should have to be at
agreed-upon positions in the register stack.

This could require some reshuffling at the block boundaries. This
could still be cheaper than loading and storing from memory, depending
on the usage, but at least with a maximum of a few registers, say up
to 3, kept live it seems reasonable to prefer the possible
reshuffling. I don't want to fix the maximum to some arbitrary low
value, it should be possible to say that all 8 registers can be kept

This I did by creating objects called 'Groups'. I didn't know what
to call them, hence my question for standard terminology.

A Group defines a specific layout of the register stack. A basic block
has an input group and an output group, possibly the same. The groups
are created by looking at how the basic blocks are connected. An
output group is the same as the the input group of all the successors
of the block. An input group is the same as the output group of all
the predecessors. Applying these rules iteratively should define the
input and output groups for all blocks.

Then, having found the groups, for each group the set of all involved
registers is calculated. This is taken as the union of the registers
that are live out for each basic block that has the group as output.

Then the set of registers are laid out in a certain configuration.
I think there should be place for optimizations here, but I don't
care much currently. I just take the configuration that somewhat
matches the configuration after the code generation in the first
basic block encountered for the group in a depth-first traversal.

Then, at the boundaries between basic blocks, I generate code that do
a 'unification' (***) to adapt the register configuration to the one
specified in the output group of the block. This involves finding a
sequence of exchanges and other instructions to reshuffle the register
contents, and removing or adding virtual registers. The stack may have
to be adjusted either up or down. (Takes about one or two instructions
per register, generally, I think.)

I think this scheme works, although I found a bug in the unification
code that I will try to fix next week.

Well, that is just a sketchy description right now, I hope it will do
as information about my initial question which, well, I'll just repeat

So I would like to ask if someone can comment on how to handle x86
floating point registers that are live across basic blocks, and
possibly help me with references to existing work and standard
terminology. Thanks beforehand.

Best Regards,

Sverker Nilsson

(*) The first backend, it is a fairly separate module and
it should be possible to make others.

(**) There is a concept of 'undefined' register tags, which
raises exceptions when operated on. I was a bit surprised when
discovering that even the exchange operation, when one operand
is defined and the other undefined, raises an exception. This
further limits how the register stack can be used.

(***) Term taken from source code of Psyco, the Python Specializing
Compiler, that does something like this, called unification. (In
'run/compile'-time! but not involving the floating point register
stack last time I looked.)

I am, however, currently using a graph-coloring global register
allocation algorithm so no 'unification' is needed for integer
registers. I am using the same global register allocator for the
virtual floating point registers, but as described above, the
reshuffling done within basic blocks creates the need for the
'unification' at block boundaries.

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.