|re: Compiler optimizations email@example.com (MCCALPIN(5.61+/IDA-1.2.8) id AA28847; Thu, 7 Jun 9) (1990-06-07)|
|From:||MCCALPIN <firstname.lastname@example.org> (5.61+/IDA-1.2.8) id AA28847; Thu, 7 Jun 90 08:46:40 -0400|
|Date:||Thu, 7 Jun 90 17:13:59 GMT|
In article <1990Jun7.email@example.com> James Larus writes:
>Let's make this argument more concrete. Consider the example of summing the
>elements of a vector:
> for i <- 1 to n do
> sum <- sum + A[i]
>....on a parallel computer it can [use] a parallel prefix
>algorithm that builds up a tree of additions. The problem is that this
>optimization requires addition to be associative. Integer and floating point
>addition isn't associative. However, how many programmers have carefully
>analyzed their algorithm and convinced themselves that summing the elements
>from 1 to N will produce the least error?
This code will already be modified by a number of current compilers.
I have seen the following "interpretations" of the above on existing
(1) Calculate the sum in 80-bit precision on the 8087 stack and truncate
on completion of the loop.
(2) Add the two numbers in 80-bit precision, then truncate, store, and
re-load the 'sum' variable at each iteration of the loop.
(3) Add the numbers in order in unnormalized 128-bit floating-point format.
(4) Unroll the loop to some depth (typically 4) and calculate: (MIPS?)
for i <- 1 to n by 4 do
sum <- sum + A[i] + A[i+1] + A[i+2] + A[i+3]
(5) Stripmine the loop by groups of 64 and add them separately: (Cray 1)
set temp[i]=0.0, i=1..64
for isect <- 1 to (n/64-1) do
for i <- 1 to 64 do
temp[i] <- temp[i] + A[i+(isect-1)*64]+A[i+(isect*64)]
for i <- 1 to 64 do
sum <- sum + temp[i]
So we might as well forget about the problems with non-associativity of FP
addition, since the compilers are going to do the calculations in a different
order than the one we chose anyway! Unless optimization/vectorization is
turned off, you cannot count on any two machines adding that simple loop in
the same order or with the same rounding/truncation -- even if they nominally
use the same floating point rules and precisions.
>Of course, some languages such as Fortran 90 permit a programmer to express
>his intent in a manner that gives the compiler information about when order
>matters. Unfortunately, such languages are rare and only include a few common
Fortran-90 actually lets you express *less* about the actual order of the
calculations by allowing you to specify that order does not change the
*mathematical* result. The array syntax includes the implicit assumption
that data dependencies do not exist, so the calculations can proceed in any
order. This really does nothing for the problem of associativity. It does
allow a much nicer interface to the data parallel programming paradigm.
>[I suppose there is merit to this suggestion, but the thought of an optimizer
>that second guesses the programmer and attempts to generate code for what the
>program would have said if the source language was different fills me with
>considerable unease. -John]
But that is exactly what vectorizers do, and the better ones do a pretty
good job. The bad ones, though.... (I guess I shouldn't mention any brand
names for fear of reprisals!)
On a more philosophical note, this direction is precisely the one being taken
by the proponents of higher-level programming languages/environments. In
order for computers to be really helpful extensions to the human intellect,
we will have to give them increasing autonomy in choosing exactly how to go
about solving a problem. I understand that their is an active research
program at Purdue on expert systems for partial differential equations that
is attempting to do get the computer to choose (or advise the user to choose)
the discretization strategy and algorithm to solve a specified partial
differential equation. That is a system that would probably fill John with
even more "considerable unease"!
Return to the
Search the comp.compilers archives again.