Thu, 7 Jun 90 17:13:59 GMT

Related articles |
---|

re: Compiler optimizations mccalpin@vax1.udel.edu (MCCALPIN(5.61+/IDA-1.2.8) id AA28847; Thu, 7 Jun 9) (1990-06-07) |

Newsgroups: | comp.compilers |

From: | MCCALPIN <mccalpin@vax1.udel.edu> (5.61+/IDA-1.2.8) id AA28847; Thu, 7 Jun 90 08:46:40 -0400 |

Date: | Thu, 7 Jun 90 17:13:59 GMT |

Organization: | Compilers Central |

Keywords: | code, optimize |

In article <1990Jun7.010349.2097@esegue.segue.boston.ma.us> James Larus writes:

*>Let's make this argument more concrete. Consider the example of summing the*

*>elements of a vector:*

*> for i <- 1 to n do*

*> sum <- sum + A[i]*

*>....on a parallel computer it can [use] a parallel prefix*

*>algorithm that builds up a tree of additions. The problem is that this*

*>optimization requires addition to be associative. Integer and floating point*

*>addition isn't associative. However, how many programmers have carefully*

*>analyzed their algorithm and convinced themselves that summing the elements*

*>from 1 to N will produce the least error? *

This code will already be modified by a number of current compilers.

I have seen the following "interpretations" of the above on existing

compilers:

(1) Calculate the sum in 80-bit precision on the 8087 stack and truncate

on completion of the loop.

(2) Add the two numbers in 80-bit precision, then truncate, store, and

re-load the 'sum' variable at each iteration of the loop.

(3) Add the numbers in order in unnormalized 128-bit floating-point format.

(Cyber 205)

(4) Unroll the loop to some depth (typically 4) and calculate: (MIPS?)

for i <- 1 to n by 4 do

sum <- sum + A[i] + A[i+1] + A[i+2] + A[i+3]

(5) Stripmine the loop by groups of 64 and add them separately: (Cray 1)

set temp[i]=0.0, i=1..64

for isect <- 1 to (n/64-1) do

for i <- 1 to 64 do

temp[i] <- temp[i] + A[i+(isect-1)*64]+A[i+(isect*64)]

end_for

end_for

for i <- 1 to 64 do

sum <- sum + temp[i]

end_for

So we might as well forget about the problems with non-associativity of FP

addition, since the compilers are going to do the calculations in a different

order than the one we chose anyway! Unless optimization/vectorization is

turned off, you cannot count on any two machines adding that simple loop in

the same order or with the same rounding/truncation -- even if they nominally

use the same floating point rules and precisions.

*>Of course, some languages such as Fortran 90 permit a programmer to express*

*>his intent in a manner that gives the compiler information about when order*

*>matters. Unfortunately, such languages are rare and only include a few common*

*>idioms.*

Fortran-90 actually lets you express *less* about the actual order of the

calculations by allowing you to specify that order does not change the

*mathematical* result. The array syntax includes the implicit assumption

that data dependencies do not exist, so the calculations can proceed in any

order. This really does nothing for the problem of associativity. It does

allow a much nicer interface to the data parallel programming paradigm.

*>[I suppose there is merit to this suggestion, but the thought of an optimizer*

*>that second guesses the programmer and attempts to generate code for what the*

*>program would have said if the source language was different fills me with*

*>considerable unease. -John]*

But that is exactly what vectorizers do, and the better ones do a pretty

good job. The bad ones, though.... (I guess I shouldn't mention any brand

names for fear of reprisals!)

On a more philosophical note, this direction is precisely the one being taken

by the proponents of higher-level programming languages/environments. In

order for computers to be really helpful extensions to the human intellect,

we will have to give them increasing autonomy in choosing exactly how to go

about solving a problem. I understand that their is an active research

program at Purdue on expert systems for partial differential equations that

is attempting to do get the computer to choose (or advise the user to choose)

the discretization strategy and algorithm to solve a specified partial

differential equation. That is a system that would probably fill John with

even more "considerable unease"!

--

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.