Fri, 26 Aug 1994 06:00:10 GMT

parallelizing loops on a DMA edward@nsrc.nus.sg (1994-08-26) |

Newsgroups: | comp.compilers |

From: | edward@nsrc.nus.sg (Edward Walker) |

Keywords: | experiment, parallel, question |

Organization: | National University of Singapore |

Date: | Fri, 26 Aug 1994 06:00:10 GMT |

Further to my recent post with my request to use a parallel machine

to conduct some experiments. I would like to extend my request

for some time on a *distributed memory machine* as well.

--------------------------

The purpose of my experiments is to try and derive an optimal

parallel form for the hypothetical loop (first introduced in [1])

FOR i = 1, n

FOR j = 1, m

FOR k = 1, p

A(i+j,3*i+j+3) = ...

... = A(i+j+1,i+2*j+4) ...

using the dataflow information I generate for the definition and

reference of the elements in the array A(). At the moment, I can

generate the pure dataflow (i-j axis) version of the above loop by

deriving the distance vector of the above data dependence. What I

need to find out is whether the additional overhead in synchronizing

for all the data dependencies will overwhelm any benefits incurred.

So essentially, I would like to do multiple runs with varying (n,m,p)

on (possibly) different dataflow versions of the above kernel.

The hope (fantasy?) of course is to one day allow a parallelizing

compiler to generate the dataflow form automatically.

Many thanks again.

- edward

[1] T. H. Tzen and Lionel Ni, "Dependence Uniformization: A Loop

Parallelization Technique", IEEE Trans. Parallel and Distributed

Systems, vol. 4, no. 5, 1993, pp. 547--558.

--------------------------------------------------------------------

Edward Walker

National Supercomputing Research Centre

81, Science Park Drive

#04-03, The Chadwick

Singapore Science Park

Singapore 0511

internet: edward@minster.york.ac.uk

tel: (65)-7909-226

--

