Related articles |
---|
parallelizing loops on a DMA edward@nsrc.nus.sg (1994-08-26) |
Newsgroups: | comp.compilers |
From: | edward@nsrc.nus.sg (Edward Walker) |
Keywords: | experiment, parallel, question |
Organization: | National University of Singapore |
Date: | Fri, 26 Aug 1994 06:00:10 GMT |
Further to my recent post with my request to use a parallel machine
to conduct some experiments. I would like to extend my request
for some time on a *distributed memory machine* as well.
--------------------------
The purpose of my experiments is to try and derive an optimal
parallel form for the hypothetical loop (first introduced in [1])
FOR i = 1, n
FOR j = 1, m
FOR k = 1, p
A(i+j,3*i+j+3) = ...
... = A(i+j+1,i+2*j+4) ...
using the dataflow information I generate for the definition and
reference of the elements in the array A(). At the moment, I can
generate the pure dataflow (i-j axis) version of the above loop by
deriving the distance vector of the above data dependence. What I
need to find out is whether the additional overhead in synchronizing
for all the data dependencies will overwhelm any benefits incurred.
So essentially, I would like to do multiple runs with varying (n,m,p)
on (possibly) different dataflow versions of the above kernel.
The hope (fantasy?) of course is to one day allow a parallelizing
compiler to generate the dataflow form automatically.
Many thanks again.
- edward
[1] T. H. Tzen and Lionel Ni, "Dependence Uniformization: A Loop
Parallelization Technique", IEEE Trans. Parallel and Distributed
Systems, vol. 4, no. 5, 1993, pp. 547--558.
--------------------------------------------------------------------
Edward Walker
National Supercomputing Research Centre
81, Science Park Drive
#04-03, The Chadwick
Singapore Science Park
Singapore 0511
internet: edward@minster.york.ac.uk
tel: (65)-7909-226
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.