# Re: Common subexpression elimination (CSE) using value numbering?

## vidar.hokstad@gmail.com26 May 2005 23:08:57 -0400

From comp.compilers

Related articles
Common subexpression elimination (CSE) using value numbering? Hu.YuehWei@gmail.com (2005-05-26)
Re: Common subexpression elimination (CSE) using value numbering? torbenm@diku.dk (2005-05-26)
Re: Common subexpression elimination (CSE) using value numbering? vidar.hokstad@gmail.com (2005-05-26)
Re: Common subexpression elimination (CSE) using value numbering? stevenb@suse.de (Steven Bosscher) (2005-05-26)
| List of all articles for this month |

 From: vidar.hokstad@gmail.com Newsgroups: comp.compilers Date: 26 May 2005 23:08:57 -0400 Organization: http://groups.google.com References: 05-05-211 Keywords: analysis, SSA Posted-Date: 26 May 2005 23:08:56 EDT

I haven't read the papers you mention, so I'm answering based on my
understanding of SSA only.

Hu.YuehWei@gmail.com wrote:
> All things are perfect except one condition. I don't know how to
> handle this condition in your optimization framework. The condition I
> can't handle is as following:
>
> A = B + C
> A = D <-- this expression must define the same variable
> as the previous one
> E = B + C
>
> After transfering to the SSA form:
>
> A0 = B0 + C0
> A1 = D0
> E0 = B0 + C0
>
> After applying the value numbering, I can say that A0 and E0 have the
> same value. After applying AVAIL common subexpression elimination, I
> can replace the last expression with "E0 = A0". Thus, the whole codes
> become this one:
>
> A0 = B0 + C0
> A1 = D0
> E0 = A0 <--
>
> However, because both "A0" and "A1" have the same memory address, so
> that I will get a wrong value when I execute the last expression after
> the second expression.

Why does A0 and A1 have the same memory address?

One of the important things to realise with SSA is that depending on
what you do to the SSA form you can not automatically assume that you
can combine the values back together in a single variable.

In your compiler, as long as the program is in SSA form, you must treat
all your variables (i.e. A0,A1, B0,C0,D0,E0) as separate. It is only
when you convert the program back from SSA form (for instance when
generating code) that you can make the decision of which of these
should be in the same memory locations.

You can do that by determining the liveness of each variable, and can
also do reordering of expressions to minimize the number of variables.
In your example, for instance, once you have reached the step "E0 = A0"
that means that E0 literally is the same as A0 - it can never take on
another value, and can never have had another value, so you can reorder
like this:

A0 = B0 + C0
E0 = A0
A1 = D0

Or you can go one better, and systematically replace all occurances of
E0 in the rest of the program with A0, and remove the last expression:

A0 = B0 + C0
A1 = D0

Note that you really ought to ensure you do this, as assignments of the
form E0 = A0 or A1 = D0 are completely pointless to retain in SSA form
unless you're forced to by specific language semantics (i.e. if the
variables should live in a location that might change in between
assignments due to another thread etc.

If you do the above step for the expresssion E0 = A0 only, then when
you generate code, A0 and A1 will indeed need to be allocation to
entirely separate variables that needs to be stored in separate
locations (provided they are both used later in the program, and the
expressions can't be reordered so that A1 = D0 can happen after the
last use of A0.

Think of allocating a piece of memory to a variable from SSA form as
almost the same as register allocation - all the same algorithms can be
a small number of available memory locations, and each time you would
spill onto the stack with the register allocation algorithm you instead
have the option of allocating another variable location (whether on the
stack for local variables, or on the heap).

That way variables that started out as one may end up at multiple
locations in the generated code, but that doesn't matter - you should
end up with no more addresses total, and often quite a bit less as
variables that humans keep separate for readability can be combined
into the same part of memory, even if the variables aren't even of the
same type.

[snip]
> One way removes the definition of E0, and the other way assigns a
> wrong value to E0.

Based on your description of the process, I think you misunderstand,
and that what was meant was one or both of the following:
- Removal of the second occurence of the expression B0 + C0
- Removing E0 and systematically replacing every occurance of it
with A0, eliminating E0 entirely.

Hope this helps.

Vidar

Post a followup to this message