Re: How to decide on the meet operator in Data-flow Analysis?

Patryk Zadarnowski <pat@jantar.org>
Mon, 15 Mar 2010 20:08:33 +1100

          From comp.compilers

Related articles
How to decide on the meet operator in Data-flow Analysis? vivekb1985@gmail.com (Vivek B) (2010-03-14)
Re: How to decide on the meet operator in Data-flow Analysis? pat@jantar.org (Patryk Zadarnowski) (2010-03-15)
| List of all articles for this month |

From: Patryk Zadarnowski <pat@jantar.org>
Newsgroups: comp.compilers
Date: Mon, 15 Mar 2010 20:08:33 +1100
Organization: Compilers Central
References: 10-03-030
Keywords: analysis
Posted-Date: 15 Mar 2010 21:55:24 EDT

On 15/03/2010, at 4:18 AM, Vivek B wrote:


> Dear All,
>
> In an iterative data flow analysis the solution obtained will be less
> than or equal to the IDEAL solution
>
> If I am trying to solve the problem in two different approaches -
> using two different meet operators (^)
>
> 1. where meet operator is U (union)
> 2. where the meet operator is (intersection)
>
> then the meet over path solutions, MOP1 and MOP2, obtained are both
> less than or equal to IDEAL. (being conservative)
>
> MOP1 <= IDEAL
>
> MOP2 <= IDEAL
>
> How can I know which among MOP1 and MOP2 gives a superior result?
>
> Is there a formal way to decide which meet operator has to be used ?




I'm afraid you're looking at this the wrong way.


The union and intersection operations don't give you a pair of
different approximations to the same problem (your ideal solution),
but, rather, the same solution (namely: the least fixpoint, obtained
by typical iterative data flow algorithms) to a pair of two completely
different problems.


If you look at your data flow problem as a system of simultaneous
(recursive) set equations, then the set intersection operator in your
analysis will deliver the least solution, in the sense that it will
assign to each variable in the system the smallest possible set that
satisfies the constraints of the entire systems. Dually, using the
union operator will deliver the maximal solution, i.e., the largest
sets that satisfy your program's constraints.


Bear in mind that the maximal solution (i.e., the one delivered by the
union operator) often does not exist, so that your data flow analysis
will often (usually?) diverge, unless you perform the analysis over a
finite abstract domain (that is, unless you choose your approximations
out of a small finite set of possible values.)


Another way of thinking about it is this: in most cases, intersection
will give you "must satisfy" information about your data points, while
union will give you "may satisfy" criteria. So that, for example, if
you use your iterative data flow analysis to determine which pointers
address which objects, union will probably tell you which pointer MAY
point to a given object, while intersection will determine which
pointers always point to a given object in memory.


As an example, take sparse conditional constant propagation of Wegman-
Zadeck [1]. Although it is generally described as abstract
interpretation over a lattice-based approximation of run-time values,
you can equally well think of it as an iterative MOP solution to the
data-flow problem, in which the approximate value of each variable in
the program is described as a set of constants that the variable may
contain at any given execution path through the program. Their meet
operation is essentially isomorphic to set intersection, if we take
the initial value assigned to each variable (BOTTOM in
lattice-theoretic terminology) to represent the universal set, and the
ultimate "don't know" (or TOP) value to represent an empty set. The
algorithm begins by assigning the universal set to each variable, then
interesting that with singleton sets containing just a single constant
(arriving at a singleton sets, represented in the middle of
Wegman-Zadeck's lattice.) Finally, when two such singleton sets are
intersected, you'll get the same set back if and only if the sets are
identical to begin with (i.e., if the variable is assigned the same
constant value on both execution paths.) Otherwise, you'll get the
intersection of two non-overlapping singleton, i.e., an empty set, or
TOP. Note that the algorithm gives you "MUST" information.


The algorithm terminates because intersection is a monotonic non-
increasing function (in other words, it never adds new elements to the
initial approximation.) If you keep removing elements, you're
guaranteed to eventually reach either an empty set (TOP) or else a
point when your approximations stop changing.


Related value-range analysis [2] algorithms substitute the union
operator for intersection and, although superficially similar, have
very different set of properties. For one, union is monotonically
non-decreasing, so that there's no guarantees the algorithm will ever
terminate. In practice, this is usually solved by giving up the
analysis when the sets reach a certain size. Further, once you're
done, the result is a set of zero or more values for each variable in
the program, so that the compiler doesn't actually know which one of
these values will be actually assumed during execution. In short,
value-range analysis gives you "MAY have this constant value"
information, while constant propagation gives you "MUST have this
value" data. Very different results and applications, even though the
ONLY difference between the two algorithms is the precise meet
function (set union or set intersection) that was applied during
analysis.


Cheers,
Patryk Zadarnowski


[1] Wegman, Mark N. and Zadeck, Frank Kenneth, "Constant propagation
with conditional branches",
in POPL '85: Proceedings of the 12th ACM SIGACT-SIGPLAN symposium on
Principles of programming languages, pages 291-299, 1985.


[2] Patterson, Jason R. C., "Accurate static branch prediction by
value range propagation",
in PLDI '95: Proceedings of the ACM SIGPLAN 1995 conference on
Programming language design and implementation, pages 67-78, 1995.



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.