# Re: Irregular Data Access Pattern Loops on Parallel Machines

## preston@dawn.cs.rice.edu (Preston Briggs)Wed, 13 Oct 1993 03:04:30 GMT

From comp.compilers

Related articles
Irregular Data Access Pattern Loops on Parallel Machines Mounir.Hahad@irisa.fr (1993-10-05)
Re: Irregular Data Access Pattern Loops on Parallel Machines preston@dawn.cs.rice.edu (1993-10-13)
Re: Irregular Data Access Pattern Loops on Parallel Machines moyer@mathcs.emory.edu (1993-10-13)
| List of all articles for this month |

 Newsgroups: comp.compilers From: preston@dawn.cs.rice.edu (Preston Briggs) Keywords: parallel Organization: Rice University, Houston References: 93-10-031 Date: Wed, 13 Oct 1993 03:04:30 GMT

>I'm looking for references to articles dealing with the problem of
>compiling loops with irregular data access patterns on parallel
>computers for minimum communication/maximum locality. [e.g.]
>
>DO I=1,N
> A(B(I)) = f
>ENDDO
>
>where B is not known at compile time.

This is a bit out of my area, but I haven't seen any other replies, so
I'll take a crack at it.

There are probably some papers mentioning the problem; however, it's
generally quite difficult and I know of no good compile-time approach.

In the past, there's been some discussion around here about doing
dependence analysis on such things if we could determine that B was a
"permutation vector"; that is, if we could be certain that

B[i] != B[j] when i != j

Such things tend to arise when using Gaussian elimination with partial
pivoting, for example. One approach to the detection of a permutation
vector might be to notice initialization of the form

do i = 1, n
B[i] = i
end

followed exclusively by references to B or swap operations between
elements of B. This seems a little out of reach for most Fortran.
Alternatively, a directive of some sort might be used.

The common current approach is to use some sort of "inspector/executer"
code at run time. The idea is to first examine the elements of B to see
which elements of A will be referenced or defined. Then, given that
knowledge, plan how to block up the communication into efficient chunks.
A similar approach can be used when blocking sparse matrix operations for
cache. I don't know that people are able to insert all this
inspector/executer code at compile time, but Joel Saltz has done a fair
amount of work on subroutines to support irregular computations of this
sort on distributed memory machines. Perhaps someone else will know the
references. I believe the package is called PARTI.

Preston Briggs
--

Post a followup to this message