Common subexpression analysis

mernst@theory.lcs.mit.edu (Michael Ernst)
Wed, 3 Jun 1992 21:49:49 GMT

          From comp.compilers

Related articles
Common subexpression analysis mernst@theory.lcs.mit.edu (1992-06-03)
Re: Common subexpression analysis msharp@cs.ulowell.edu (1992-06-04)
Re: Common subexpression analysis preston@dawn.cs.rice.edu (1992-06-04)
Re: Common subexpression analysis f88ho@efd.lth.se (1992-07-14)
| List of all articles for this month |

Newsgroups: comp.compilers
From: mernst@theory.lcs.mit.edu (Michael Ernst)
Keywords: optimize, question
Organization: Compilers Central
Date: Wed, 3 Jun 1992 21:49:49 GMT

Is the topic of common subexpression analysis considered a mature field?
A search through the ACM and INSPEC collections revealed almost no work in
recent years; the topic seems to have died off after the mid-70s. Much of
the work is architecture-specific or discusses computing available
expressions rather than identification of common subexpressions per se.
Or am I missing something? Is there literature on computing common
subexpressions on three-address machines with a finite, non-unity number
of registers?


What is the current state of the art in common subexpression elimination?
Most authors (and GCC) take an expression forest (or three-address code)
representing the computations of a basic block and walk from the bottom
up, looking for equivalent subexpressions to identify, so the forest
becomes a multiply-rooted dag.


This completely ignores the question of organizing the original expression
tree so as to ensure that the maximum number of common subexpressions can
be found. For instance, consider the two expressions


    x = a + b + c + d
    y = b + c + d + e


No matter how these are parsed into binary trees (left-associative,
right-associative, minimum-height tree), the standard methods won't find
any common subexpressions. This particular example could be corrected by
using (for associative operators) nodes with more than two children and/or
by sorting such operands according to some metric. It's easy to think of
cases in which the latter isn't optimal; does the former always do the
trick?


Do any real compilers make an effort to expose common subexpressions in
this way, or are such techniques mentioned in the literature?


I am interested in learning of any work relevant to CSE.


I will summarize replies received by email, and will post a bibliography if
there is sufficient interest.
-Michael Ernst
mernst@theory.lcs.mit.edu
[It is my vague recollection that even the ancient Ritchie PDP-11 C compiler
sorted associative lists like the one above so it could collapse all the
constants. -John]
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.