Wed, 12 Sep 90 11:28:58 -0500

Related articles |
---|

"Tokenizing" Analyses (successors to Jones-Muchnick's 1982 paper) pfeiffer@herve.cs.wisc.edu (1990-09-12) |

Newsgroups: | comp.compilers |

From: | pfeiffer@herve.cs.wisc.edu (Phil Pfeiffer) |

Keywords: | optimize |

Organization: | Compilers Central |

Date: | Wed, 12 Sep 90 11:28:58 -0500 |

In an earlier posting, Kwang-Keun Yi writes:

*> [In] "A Flexible Approach to Interprocedural Data flow Analysis*

*> and Programs with Recursive Data Structures" by Neil Jones and Steven*

*> Muchnick [POPL '82], they mentioned in the conclusion section that*

*>*

*> "Two areas of investigation that appear fruitful are to consider how*

*> the computational complexity of the analyzer varies with the choice*

*> of token set and the the feasibility of building an analyzer with*

*> the choice of token set as a parameter."*

*>*

*> Can somebody help me to find papers on the investigations? *

This is a *big* question. There have been a series of papers since 1982 that

amount to *specific* extensions of JM82; some of the authors have addressed

complexity in their papers as well. Here's my stab at providing you with

references and a (partial, incomplete) survey.

Extensions to JM82 typically vary according to the proposed labeling

technique and the strategy used to approximate arbitrarily large data

structures (I'm using the word "label" in place of the word "token" because I

like it better.) Jones and Muchnick, as I understand the paper, suggest

labeling every structure x in a store s with a value such as [k] to show that

x was created by the statement "at" program point [k]. Another labeling

scheme that has been proposed tags x with the value ([k],n) to show that x

was the nth object created at [k]. Also, more elaborate labeling schemes

have been proposed that take a program's nesting structure into account --

e.g., if s is labeled ([k],m,n), then s was the nth object created during the

mth iteration of statment [k]'s enclosing loop. This strategy is mentioned

in a footnote in P. Hudak's "A semantic model of reference counting and its

abstraction" in _Abstract Interpretation of Declarative Languages_ (pp.

45-62)). A variant of this is mentioned in a forthcoming paper by Francois

Bodin (you should be able to get a copy of this from Dennis Gannon, who's

also at CSRD) and in a 1989 Sigplan paper by Susan Horwitz, Phil Pfeiffer,

and Tom Reps. The HPR89 paper also describes several other strategies for

using labels to track a program's execution history. See also the

Chase-Wegman-Zadeck paper in SIGPLAN 90, and the papers referenced in its

related works section (more on CWZ90 below).

Two different approximation mechanisms are discussed in JM82: one for stores,

and a second for the stack. Many of the analyses that I've seen, however,

simply "flatten" the stack: i.e., assume that a procedure can return to any

of the procedures that can call it. More complex approximations to the stack

than this are usually avoided because approximations to stores that contain

dynamically-allocated objects are already quite expensive to compute. Both

of the reports that I know of that *do* construct non-trivial approximations

to the stack were "forced into it" by of the language under consideration.

The one, Uwe Pleban's 1981 thesis (U.Kansas), describes a preexecution

analysis for a dialect of Scheme that supports downward closures. The other,

Jan Stransky's 1988 thesis (University de Paris-Sud, Centre d'Orsay, disser-

tation in French) describes a preexecution analysis for the (dynamically-

scoped language) Lisp.

Different authors have given different estimates of the respective

computational complexities of their analyses. One of the best references on

the complexity of labeled analyses is a paper in SIGPLAN 1990 by Chase,

Wegman, and Zadeck. CWZ90 deals specifically with the problem of computing

an approximation to the set of stores that a program can generate, in the

presence of allocation primitives (e.g., cons) and procedures. The authors

also give a clever algorithm that runs much faster than the standard,

"simple" store-building algorithm when a program doesn't create "too many

edges". CWZ90 also quotes some complexity results from other people's theses

that you might find helpful. (Their characterization of Ruggieri's work,

however, is not quite right; Ruggieri, in the final chapter of her thesis,

discusses implementation techniques (such as union-find) that can be used to

improve the running-time of her analysis -- i.e., the one cited in the

report). I'd also recommend Hendren's work on dag-like and tree-like stores

(there's a reference a paper by her in the Chase-Wegman-Zadeck paper; her

thesis is available from Cornell as technical report 90-1114. Hendren's work

is actually based on matrix manipulation, but it is equivalent to yet another

variation on store graphs if you stare at it, cross-eyed, for a suitably long

time).

Stransky actually built an analyzer, and experimented with different

parameters for his analyzer. Stransky's thesis has some empirical

observations on how well the analyzer ran for different labeling techniques

and different "orders" of stack approximations. A six-word summation of

Stransky's experience with his analyzer: "effective, but very slow and

clumsy". This is one of the threads that runs through all of the literature

on pointer analysis.

The SIGPLAN paper that Susan Horwitz, Tom Reps, and I wrote also describes a

technique for proving that labels are being used correctly. (It turns out

that the same concern was brought up in Pleban's thesis, Stransky's thesis,

and, apparently, in Cousot's thesis).

-- Phil Pfeiffer

UUCP: ...!{harvard,seismo,topaz,akgua,allegra,usbvax}!cs.wisc.edu!pfeiffer

(608) 262-6625

--

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.