Looking for Program Transformation Examples

tseng@rice.edu (Chau-Wen Tseng)
Fri, 24 Apr 1992 03:49:42 GMT

          From comp.compilers

Related articles
Looking for Program Transformation Examples tseng@rice.edu (1992-04-24)
| List of all articles for this month |

Newsgroups: comp.compilers
From: tseng@rice.edu (Chau-Wen Tseng)
Keywords: optimize, question
Organization: Rice University
Date: Fri, 24 Apr 1992 03:49:42 GMT

Hello netters,

I am looking for example program kernels that can benefit from
the following compiler optimizations.

(1) Unimodular Loop Transformations

        Unimodular loop transformations use matrix calculations to guide
        some combination of loop interchange, skewing, and reversal to
        parallelize perfectly nested loops. A simple example
        is Successive-Over-Relaxation (SOR),

            do j = 2,N-1
                do i = 2,N-1
                    A(i,j) = F(...A(i-1,j),A(i,j-1),A(i+1,j),A(i,j+1)...)

        where loop skew and interchange can parallelize the inner loop:

            do i = 4,2*(N-1)
                doall j = max(2,i-N-1),min(N-1,i-2)
                    A(i-j,j) = F(...A(i-j-1,j),A(i-j,j-1),A(i-j+1,j),A(i-j,j+1)...)

        Another example comes from tridiagonal solves, where loop
        interchange can be used to move a parallel loop outwards.

            do j
                doall i
                    A(i,j) = F(...A(i,j-1)...)

        However, I can't seem to find much else. Are there common
        computations that can utilize unimodular loop transformations?

(2) Loop Distribution

        Loop distribution creates multiple loops from a single loop. It is
        pretty essential for vectorizing compilers. The most common use
        for loop distribution is to separate a sequential reduction from a
        parallel loop. For instance,

            do j
                A(j) = F(...)
                t = t + A(j)

        can be distributed to yield the following loops:

            doall j
                A(j) = F(...)
            do j
                t = t + A(j)

        I'm having a difficult time discovering other cases where loop
        distribution can be used to increase parallelism.

I would appreciate pointers to computations that can benefit from
either of these techniques. I'll summarize responses, if I get any :-)


Chau-Wen Tseng
Grad Student
Rice University

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.