Re: Loop ROLLING (Brian Bliss)
Fri, 1 Nov 91 18:56:36 GMT

          From comp.compilers

Related articles
Loop ROLLING (1991-10-25)
Re: Loop ROLLING (1991-10-28)
Re: Loop ROLLING (John D. McCalpin) (1991-11-01)
Re: Loop ROLLING (1991-11-01)
| List of all articles for this month |

Newsgroups: comp.compilers
From: (Brian Bliss)
Keywords: Fortran, optimize
Organization: UIUC Center for Supercomputing Research and Development
References: 91-10-103 91-11-003
Date: Fri, 1 Nov 91 18:56:36 GMT

In article 91-11-003, "John D. McCalpin" <> writes:
|> >> On 25 Oct 91 15:47:13 GMT, (Rick Gorton) said:
|> Rick> I'm looking for information on loop ROLLING, and no-one I've
|> Rick> talked to can recall seeing anything specific about this topic.
|> I seem to recall that the Pacific Sierra VAST family of Fortran
|> source-to-source preprocessors are able to re-roll loops to help the
|> vectorizors see that a "normal" vector construct is present. This memory
|> is from working on the ETA-10G, so I suspect that their current products
|> have even more functionality in this regard.
|> The preprocessor from Kuck and Associates that is used by the IBM XLF
|> version 2 compiler will re-roll and then unroll simple do loops. I have
|> not tried it on terribly complicated loops, though. Since HP also uses a
|> Kuck and Associates preprocessor, their compiler should be able to do this
|> as well.

I've used both the VAST optimizer and KAP (Kuck & Associates program) on
the Alliant FX/Series of computers, and done some comparisons between
them. KAP was consistently able to re-roll unrolled loops, and I can't
remember VAST ever doing so. In general, KAP was superior to VAST in most
every way, except the VAST optimizer output alliant-specific intrinisic
functions at the back end to calculate loop bounds & stride. These
instructions are faster than the vanilla fortran expressions output by
KAP, so in cases of simple optimizations where KAP & VAST did the same
thing, the VAST code would run faster; in routines where KAP's superior
analysis/optmizations caused it to differ from VAST, the KAP - optimized
code was usually faster. The VAST optimizer itself runs much faster than
KAP; not a moot point during code development.

IMHO, both optmizers have a LONG way to go. Doing something that should
be transparent to the optmizer, such as assigning a value to a temporary
variable and substituting it in in the following code, can cause the ouput
to differ drastically. These optimizers work on the philosophy that the
original source code should be disturbed as little as possible while
adding the appropriate vector/concurrent constructs, leading to a quite
ad-hoc fashion of optmization; i.e. you iterate through a list of possible
optimizations (many of which can only be applied in very specific
situations) until the either a timer expires or no more can be applied.
The order in which the different optimizations are tried greatly affects
the resultant code. If instead, the source was broken down into triads,
then some common scalar optmizations applied, and then the concurrent
optmizations applied after that, the output would be much less sensitive
to (seemingly) insignificant changes in the input.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.