Re: x86 global floating point register allocation

"jacob navia" <jacob@jacob.remcomp.fr>
3 Sep 2002 00:09:16 -0400

From comp.compilers

Related articles
x86 global floating point register allocation sverker.is@home.se (Sverker Nilsson) (2002-08-24)
*Re: x86 global floating point register allocation jacob@jacob.remcomp.fr (jacob navia)* (2002-09-03)**
Re: x86 global floating point register allocation jgd@cix.co.uk (John Dallman) (2002-09-03)
Re: x86 global floating point register allocation reig@tenerife.ics.uci.edu (Fermin Reig) (2002-09-03)
Re: x86 global floating point register allocation ceco@jupiter.com (Tzvetan Mikov) (2002-09-08)

| List of all articles for this month |

From:	"jacob navia" <jacob@jacob.remcomp.fr>
Newsgroups:	comp.compilers
Date:	3 Sep 2002 00:09:16 -0400
Organization:	Wanadoo, l'internet avec France Telecom
References:	02-08-087
Keywords:	arithmetic, optimize
Posted-Date:	03 Sep 2002 00:09:16 EDT

There are several other limitations that you did not mention:

1) If the basic block contains any function calls, you have to save all
registers that are live and then reload them from memory. Or, you can
further restrict your basic block to avoid having functions in them. In
either case, the amount of code needed to save/restore the registers will be
quite high.

> Thus sometimes operands need to be moved to the top of the stack to be
> operated on. That is typically done by an exchange instruction. Not to
> go into too much details here, it seems that when generating code for
> a basic block, if we don't want to make excessive use of exchanges to
> keep the registers content on specific places, it is better to keep
> track of what 'virtual' register or 'color' is at what place, and just
> make the exchanges that are actually necessary. (**)

Another problem is that if your basic block contains any jumps out of
the block, you will have quite a few possibilities of where the values
are. Keeping track of the different end conditions from the basic
block will be quite demanding:

        if (foo) {
                // some code
        }
        else {
                // another fp code
        }

What are the contents of the stack after that? It depends in whether
the branch was taken or not...

> At the end of the basic block, then, it seems the virtual registers
> may have been swapped around arbitrarily in the register stack.
>
> That would be no problem if no registers were live out from the basic
> block. If there are any such registers live, they should have to be at
> agreed-upon positions in the register stack.
>
> This could require some reshuffling at the block boundaries. This
> could still be cheaper than loading and storing from memory, depending
> on the usage, but at least with a maximum of a few registers, say up
> to 3, kept live it seems reasonable to prefer the possible
> reshuffling. I don't want to fix the maximum to some arbitrary low
> value, it should be possible to say that all 8 registers can be kept
> alive.

This reshuffling has to be done for each live register. I do not see
any way out.

[explanation snipped]

> I think this scheme works, although I found a bug in the unification
> code that I will try to fix next week.

Do not worry. Debugging a register allocator is a work that will take
you several years. It is the hardest part of any compiler, and it
costs really that much effort. Of course if you are just preparing
your thesis, a buggy register allocator will never show in the tiny
examples that you will present for your exam, so it doesn't matter.

> So I would like to ask if someone can comment on how to handle x86
> floating point registers that are live across basic blocks, and
> possibly help me with references to existing work and standard
> terminology. Thanks beforehand.

If you are interested in having a good exam, just use your schema and
prepare some toy examples. That will work.

But if you are interested in having production quality code, I would
use a peephole approach, combined with expression re-arrangement to
emit reverse polish notation (stack notation). That would be a project
that could be done with one person.

Reverse polish notation would eliminate most of the loads and stores
for simple operations, and the peephole would just mark which value
was in which register for a limited set of operations (depending of
the peephole window) and try to further reduce redundant loads and
stores. Keeping a value in a register across a whole procedure could
be really difficult, and very error prone. Besides, there is the
problem of calls. Supposing you have 4 registers alive, you have to
save 4*10 bytes and reload 4*10 bytes at each function call, i.e. an
I/O of 80 bytes/call. This is more than 2 cache lines.

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: x86 global floating point register allocation

"jacob navia" <jacob@jacob.remcomp.fr>3 Sep 2002 00:09:16 -0400

"jacob navia" <jacob@jacob.remcomp.fr>
3 Sep 2002 00:09:16 -0400