Re: Common subexpressions optimizations

Chris Clark USG <>
15 Dec 1997 21:59:01 -0500

          From comp.compilers

Related articles
Common subexpressions optimizations (LMK shell) (1997-12-10)
Re: Common subexpressions optimizations (Chris Clark USG) (1997-12-15)
Re: Common subexpressions optimizations (cliffc) (1997-12-23)
| List of all articles for this month |

From: Chris Clark USG <>
Newsgroups: comp.compilers
Date: 15 Dec 1997 21:59:01 -0500
Organization: Digital Equipment Corporation - Marlboro, MA
References: 97-12-076
Keywords: optimize, analysis

LMK shell <> wrote:
> I'm trying to detect common subexpressions and put them in an unused
> register or on the stack. I'm looking for some efficient algorithm to
> detect them -- pattern matching doesn't seem too fast.

I presume you are trying to detect "local" common sub-expressions,
those within a basic block (a linear region of code), and you are
trying to match expression trees from the top down using pattern
matching and this is what you are finding slow.

If so, one solution which I have seen widely used is "value
numbering", or at least that's the name I know it by. In value
numbering, you give each distinct value (which we'll define in a
moment) a unique number and if two expressions have the same value
number, they are the same expression.

Each distinct constant is given a distinct value number. Each
distinct variable is given a distinct value number. Morever, if the
you are compiling an imperative language where variables can change
value, each time a variable is referenced such that it might have
changed from the last reference, it is given a unique value number.
For example, if there is an assignment statement (or procedure call)
which can effect the variables value, the variable, the variable is
assumed to have been changed by that assignment. Anyway, at this
point, you should have value numbers for all of the leaves of your
expression trees--in other words if you have something other than
variables and constants (function results perhaps?) that are leaves of
an expression you given them also appropriate value numbers.

Next, you build up value numbers for the rest of your expression trees
working from the leaves to the roots. Consider one such expression,
say an add of two things. You first check to see if you have already
seen an expression which adds the same two value numbers, and if so
you give this expression the same value number as the previous
occurance. Otherwise, you give this expression a new unique value
number. One way to do the look up is to keep a hash table of the
expressions you have already assigned value numbers to, for example by
adding the value numbers of the operands and a numeric value to
represent the operator, and using that to start your search.

If you find a compiler book which discusses optimization, you will
probably find a more thorough explanation of the process, but
hopefully that will give you a start. In addition, if you are serious
about value numbering, you will want to worry about "canonicalizing"
your expressions, so that a+b gets the same value number as b+a,
provided they mean the same thing in your language. (I'm pretty
certain that the problem of matching a+(b+c) with (a+b) versus (b+c)
is hard (NP-complete?) in the general case, although there are
hueristics to use.)

If you want "global" common subexpressions (across basic blocks), you
will need data-flow analysis. However, most global methods assume
that local common sub-expressions have already been resolved, so
you'll need a method like this anyway.

Hope this helps,
-Chris Clark
Compiler Resources, Inc. email:
3 Proctor St.
Hopkinton, MA 01748 phone: (508) 435-5016
USA 24hr fax: (508) 435-4847

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.