Re: Monotone Dataflow problems....

grunwald@foobar.cs.colorado.edu
Sun, 11 Oct 1992 16:51:20 GMT

          From comp.compilers

Related articles
Monotone Dataflow problems.... sreedhar@flo.cs.mcgill.ca (V.C. SREEDHAR) (1992-10-03)
Re: Monotone Dataflow problems.... preston@dawn.cs.rice.edu (1992-10-04)
Re: Monotone Dataflow problems.... grunwald@foobar.cs.colorado.edu (1992-10-11)
| List of all articles for this month |
Newsgroups: comp.compilers
From: grunwald@foobar.cs.colorado.edu
Organization: University of Colorado at Boulder
Date: Sun, 11 Oct 1992 16:51:20 GMT
Keywords: optimize, parallel, dataflow
References: 92-10-006 92-10-011

preston@dawn.cs.rice.edu (Preston Briggs) said:
> [for parallel programs]
> We can continue using traditional forms of analysis; however, some
> new problems must be solved (e.g., laying out data across multiple
> processors). Note that I would include such things as dependence
> analysis (developed to attack vectorization) as important for
> scalar machines.


This is not completly true. My student (Harini Srivinasan) and I have
formulated a set of dataflow equations for the reaching definitions
problem in a language with explicit parallelism method (PCF Fortran, for
lack of a better language). The canonical example is.,..


L = 1
do I = 1, N
Parallel Section
Section
Q = h(I)
L = L + 1
do J = 1, N
X = f(J, L * 4 )
Advance(EV1, J)
Wait(EV2, K)
end do
Section B
do K = 1, N
Y[K] = g(I,K)
Wait(EV1, K)
Z[K] = Y[K] + X + Q
Advance(EV2, K)
end do
End Parallel Section
end do


In this program, L is clearly an induction variable, and we can strength
reduce and hoist the L*4 expression. Likewise, by knowing the live-range
of X, we may (depending on architecture) be able to avoid forcing a
globally consistent memory write for the passed variable 'X' (i.e. pass it
via some special communication channel or just void consistency traffic).
Also, we can determine that post-storing 'Q' is A Good Thing, because
section A doesn't use it after it computes it the first time, and a post
store would mask latency for the use in Section B.


Applying conventional analysis to parallel programs omits certain
optimization options (i.e. all of the ones above). We assume
copy-in/copy-out semantics in our dataflow model -- not because CICO is a
great way to build a computer, but because it's allowed by the language
standard and greatly simplifes the compiler analysis. Using CICO, we only
need to consider thread interaction at parallel fork, merge and
synchronization points. Using e.g, strongly consistent memory semantics,
we'd need to consider interaction after each insn.


One thing needed to solve the flow equations is the precedence relations
between basic blocks (mainly for post/wait or advance synchronization);
this is difficult to compute (ie. see papers by Callahan, Subhlok &
Kennedy), although hueristics will probably work well in practice.


We've worked in the past on an SSA formulation for R.D in the presence of
synchronization ( Wolfe and Srivinasan worked on this w/o synch). Right
now, we're finishing up a better version of this that I would be better
able to explain in private email if you're interested.


If you would like copies of tech reports related to these papers, please
pick up the following via anonymous FTP or send me email and I'll ship 'em
out.


/anonymous@ftp.cs.colorado.edu:pub/cs/techreports/CU-CS-564-92.ps.Z
An Efficient Construction of Parallel Static Single Assignment
Form for Structured Parallel Programs


/anonymous@ftp.cs.colorado.edu:pub/cs/techreports/CU-CS-605-92.ps.Z
-- Basic montone dataflow system for R.D. problem


/anonymous@ftp.cs.colorado.edu:pub/cs/techreports/CU-CS-614-92.ps.Z
-- Proof of montonicity


Dirk Grunwald Asst. Prof, Univ. of Colorado at Boulder
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.