|Questions about partial redundancy elimination baxter@Austin.slcs.slb.com (1993-04-30)|
|Re: Questions about partial redundancy elimination firstname.lastname@example.org (1993-05-01)|
|Re: Questions about partial redundancy elimination email@example.com (1993-05-03)|
|Re: Questions about partial redundancy elimination firstname.lastname@example.org (1993-05-04)|
|Re: Questions about partial redundancy elimination email@example.com (1993-05-05)|
|Questions about partial redundancy elimination firstname.lastname@example.org (1993-05-17)|
|Re: Questions about partial redundancy elimination email@example.com (1993-05-18)|
|From:||firstname.lastname@example.org (Preston Briggs)|
|Organization:||Rice University, Houston|
|Date:||Sat, 1 May 1993 23:15:46 GMT|
baxter@Austin.slcs.slb.com (Ira Baxter) writes:
>There's an interesting paper in the most recent SIGPLAN on elimination of
>It shows how to compute "partial-redundancy" information efficiently,
>removing invariants from loops and deleting redundant computations on a
>control-flow graph representations.
That's right, it's yet another paper on partial redundancy elimination!
(the only subpart of compiler science with more papers than register
allocation). I can recall 5 papers in the past year -- 1 in POPL, 2 in
PLDI, 1 in TOPLAS, and this one in SIGPLAN Notices.
>The paper contains the remark: "It is assumed that commands computing
>expressions deliver their results to psuedo-registers, and that all
>computations of the same expression always use the same psuedo-register as
>the result register."
>This seems to imply that common subexpression *detection*, (if not
>elimination), complete with verification of identical control dependences
>has already taken place, or multiple instances of the same expression
>would have no such gaurantee.
All the papers in this area seem to make the same assumptions. We've
built a couple of versions of PRE and, while we were able to make them
work, we had to make some fairly severe compromises. Basically, the front
end generates intermediate code in a special form, following a set of
rules that makes the whole mess work. Unfortunately, I can't maintain any
of the required invariants across an optimization, so I must run PRE first
and I can only run it once.
Here's our rules for intermediate code generation to support PRE:
Whenever generating an expression (say r10 + r11), see if that
expression has been generated before. If so, make the new
expression assign to the same result register as the old one.
Thus, whenever we see r10 + r11, it will always be assigned to
the same register. This is easily accomplished by maintaining
a hash table during code generation.
Further, if the operation is commutative, sort the operands
before looking it up in the hash table.
Distinguish between "expression registers" and "variable
registers". An "expression register" is computed as the
result of some expression (e.g. R12 = r10 + r11). A "variable
register" is used to hold some user-level scalar variable
and is always assigned via a copy (from another register).
PRE is performed on expressions, not variables.
Make the result of an expression have a higher register number
than its operands (so, we get r12 = r10 + r11, never r1 = r2 + r3).
Always generate the complete expression tree. Never use any
common subexpressions (though obviously you can use common
variables). This isn't particularly hard to do given
parser-driven code generation; it's much more tedious to
accomplish by hand.
>how are commutative operators handled with such a
>technique? [As an example, I'd like to move motionable subsets of a
>multioperand ADD operations].
They aren't handled well. In fact, most things aren't handled well.
About the only thing it does well on are addressing expressions. Since
the same intermediate code generator keeps churning through the high level
code, it'll keep churning out the exact same low-level intermediate code
for all sorts of addressing expressions. PRE does a fine job on this
On the other hand, it's possible I have done too simplistic a job of
implementing this stuff. I'd be interested in hearing other people's
>I find it interesting that partially-redundant *expression* computations
>are eliminated. Is there a reason that such techniques aren't used to
>eliminate redundant *effects*? (i.e., "Common Procedural Elimination"?)
All the usual problems of interprocedural analysis apply. Additionally,
it's hard to recognize pure functions in most langages. If the routine
isn't a pure function, it's hard to summarize it's side effects accurately
enough to enable us to move it around. The only paper that talks about
title="Code Motion of Control Structures in High-Level Languages",
author="Ron Cytron and Andy Lowry and F. Kenneth Zadeck",
pages="70--85" (Second reference this week!)
Very roughly, their approach is to inline a call into a loop, then try and
move all the invariant chunks of the structure out of the loop.
Back to PRE for a brief parting shot. The recent papers that try to
minimize register pressure by placing common subexpressions as late in the
code as possible don't work. They argue that it's better to make code
r10 = foo()
r11 = bar()
a lot of stuff
r12 = r10 + r11
I think it's better to make code that looks like
r10 = foo()
r11 = bar()
r12 = r10 + r11
a lot of stuff
Why? Because r10 and r11 needn't be preserved across a lot of stuff. We
lengthen the lifetime of r12, but we shorten the lifetimes of r10 and r11,
winning in the balance.
Of course, if r10 and r11 have other uses, then my version may be a little
worse than theirs. How to choose between them? Certainly an NP-hard
problem. I don't believe anyone has attacked it.
I've perhaps been harsh on PRE. However, you should understand that we
use it in our optimizer and I don't really see a way to replace it with
another technique. I had hopes for the method described in
title="Global Value Numbers and Redundant Computations",
author="Barry K. Rosen and Mark N. Wegman and F. Kenneth Zadeck",
but I've never been able to build it. I believe it's fatally flawed in
its inability to preserve ordering of array stores and loads. Perhaps it
can be modified by the addition of some form of anti-dependences, but I
haven't seen a paper on the subject.
Return to the
Search the comp.compilers archives again.