Linear-scan copy propagation

nicolas_capens@hotmail.com (c0d1f1ed)
11 Nov 2003 13:48:37 -0500

          From comp.compilers

Related articles
Linear-scan copy propagation nicolas_capens@hotmail.com (2003-11-11)
| List of all articles for this month |
From: nicolas_capens@hotmail.com (c0d1f1ed)
Newsgroups: comp.compilers
Date: 11 Nov 2003 13:48:37 -0500
Organization: http://groups.google.com
Keywords: optimize, analysis
Posted-Date: 11 Nov 2003 13:48:37 EST

A few days ago I asked a question about copy propagation here but I
solved the problem myself. I don't know if it already exists but
here's my algorithm. It's useful for fast code generation like for JIT
compilers and gives very good results in practice.


Definition of the problem and goal: The purpose of copy propagation is
to eliminate redundant copy operations. They occur when after a some
computations the result is written to a second register for further
operations, but the first register is never used again. In this case
the copy was not required and we could have continued to work with the
first register. This also implies that copy propagation avoids
register spilling. The problem with a fast, linear code generator is
that we do not know in advance whether the original value in the first
register will be needed further on. Analyzing data flow after register
allocation in a second pass is complex and therefore unacceptable.


The algorithm: It is best explained with an example:


a := b
a *= c
b += d


Assuming that both a and b are further used, the copy instruction
cannot be eliminated. However, with a linear-scan algorithm we do not
have any lookahead information so we would have to assume b is not
used again. Assuming we have registers x, y, z and w this would result
in the following operations:


; allocate a to x
; allocate b to y
; eliminate copy operation:
; mov x, y
; deallocate b from y
; allocate a to y
; allocate c to z
mul y, z
; problem: b was eliminated


We do not have the value of b anywhere anymore. But there's a
solution: actual code generation has to happen in a final pass to be
able to resolve addresses, so this code is still stored in an
intermediate buffer. Hence we can 'untag' the copy operation so it
does get written:


mov x, y
mul y, z
...


Now, after the copy operation it doesn't matter if you use x for a, y
for b or x for b, y for a, since they are copies. But we already
started using y for a so now we have to use x for b:


mov x, y
mul y, z
; allocate b to x
; allocate d to w
add x, w


Now compare this to the original intermediate code to see this is
valid code. Also note that if b was not referenced any more (or
destroyed by another assignment operation), we would also have
generated valid code and have eliminated the redundant copy!


The implementation: We'll start with linear-scan register allocation.
For every register, we keep a reference to the memory location where
the corresponding variable is stored.


To be able to 'untag' the elimination of the copy operation, we need
to store an ID or pointer to this operation. It results in an elegant
implementation if we just store this pointer together with the
'propagated' register in the allocation table. In this above example
this is the destination register x. We do not have to deallocate any
register. The copy instruction associated with x is enough to know
that it can't be used before untagging the copy operation first:


encode_mov(reg dst, reg src)
{
        tag the mov instruction
        associate pointer with dst
        ...
}


But wait. The references of the registers do get changed, because we
want to attempt copy propagation and allocate a to y. So we have to
already reserve x for b in case it will be used. It is also the use of
b that will require the copy operation to be untagged. So we have to
swap the two references. This can also clearly be seen when you
compare the intermediate code with the generated code:


encode_mov(reg dst, reg src)
{
        tag the instruction
        associate pointer with dst
        swap dst and src references
}


Now let's see what happens if a variable is referenced. This is part
of the register allocation code:


reg allocate(ref var)
{
        if var already in a reg
        {
                if reg has copy instruction pointer associated
                {
                          untag copy instruction
                          clear copy instr pointer field
                }
                return reg
        }
        else
        {
                assign var to free reg or spill one
                return reg
        }
}


We still have to define the behaviour when spilling and freeing a
register. A spill means the value of a register is still needed so it
has to be written back to memory. When this register is a propagated
one (i.e. it has a pointer to a copy operation), the copy has to be
untagged before the register is actually written back to the ref it is
associated with. When freeing a register (at the end of a block or in
an assignment), the value of the register is no longer needed. If it
was a propagated register, the copy operation can be left tagged so we
have succeeded in eliminating the copy:


spill(reg reg)
{
        if reg has copy instr pointer
        {
                untag copy instruction
                clear copy instr pointer field
        }
        write reg to its ref
        clear ref field
}


free(reg reg)
{
        clear copy instr pointer field
        clear ref field
}


You can verify that at any moment the allocation table is in a valid
state, which also proves the correctness of the algorithm.


The algorithm has been successfully implemented in SoftWire, my
run-time x86 assembler (http://softwire.sourceforge.net) and it is
used intensively in swShader, my DirectX 9 shader emulator
(http:://sw-shader.sourceforge.net). In swShader, no less than 70% of
all register-register copy operations are eliminated, resulting in 10%
higher performance of the total program (40% of execution time is
spend in dynamically generated code). There is no measureable increase
in compilation time.


The only imperfection of the algorithm is that it does not prevent
redundant register spilling. For every variable it allocates a
register and it only eliminates copy operations. But note that
propagated registers are good candidates for spilling themselves. So
using the algorithm only has advantages.


I hope this little algorithm is useful to anybody. I think it's very
compact and elegant so it should be no problem to implement in every
JIT code generator. Although it might look simple to you, I did spend
several hours experimenting with it and trying to get it optimal
before I came up with this elegant solution, so I would appreciate a
little credit if you use it in your software. Thank you.


Regards,


Nicolas Capens


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.