|Order of argument evaluation in C++, etc. email@example.com (1995-07-08)|
|Re: Order of argument evaluation in C++, etc. firstname.lastname@example.org (WIlliam B. Clodius) (1995-07-10)|
|Re: Order of argument evaluation in C++, etc. email@example.com (1995-07-12)|
|Re: Order of argument evaluation in C++, etc. firstname.lastname@example.org (1995-07-18)|
|Re: Order of argument evaluation in C++, etc. email@example.com (Stefan Monnier) (1995-07-20)|
|Re: Order of argument evaluation in C++, etc. firstname.lastname@example.org (1995-07-21)|
|Re: Order of argument evaluation in C++, etc. email@example.com (1995-07-21)|
|Re: Order of argument evaluation in C++, etc. firstname.lastname@example.org (1995-07-26)|
|[41 later articles]|
|From:||"WIlliam B. Clodius" <email@example.com>|
|Keywords:||C++, parallel, optimize|
|Date:||Mon, 10 Jul 1995 00:26:33 GMT|
firstname.lastname@example.org (Henry G. Baker) wrote:
> The following question is posed in C++, but is essentially independent
> of the language and is therefore a generic compiler question rather
> a C++-specific question.
Since you specify that this is independent of the language I have a few
comments based on Fortran and Ada. As a non-compiler writer I may be
stepping outside my area of expertise, especially give the poster of this
question, but I am going to try anyway. Most of my examples will be for
There has been a ruling by the Fortran ANSI (or perhaps ISO) committee
that it is valid for any FORTRAN 77 compiler (and also Fortran 90)
compiler to ignore any side effects in functions, but not subroutines, in
performing optimizations. This capability is rarely, if ever, fully
exercised in practice because of the existence of a few "FUNCTIONs",
e.g., random number generators, which do have side effects. But I
suspect that some do perform some subset of optimizations that neglect
certain types of side effects. They are certainly free to make that
assumption for intrinsic functions, as is I believe C, C++, and other
languages for standard library functions.
Ada 95 has the pragma PURE which can be used by the compiler to determine
that a user defined function is free of all side effects. The draft
Fortran 90 standard has two subprogram attributes, PURE and ELEMENTAL,
(ELEMENTAL being a powerfull special case of PURE), which ensures that
all FUNCTION's are free of side effects, and all SUBROUTINE's are free of
side effects except through their arguments. These attributes, and
pragmas, are verifiable by the compiler and have implications for your
examples, and perhaps for your more general concerns.
Also a sufficiently aggressive global optimizer may be able to determine
that two functions are independent of one another. This can be the case
for functions in the same file in FORTRAN, and I suspect C and C++, or
for inlined functions in C++.
In alomost any case for which independence can be determined by the
compiler,the following holds.
> Consider the following expression in C++: plus(sqr(f(a)),sqr(g(b))).
> C++ requires that the arguments to a function be completely evaluated
> (and all side-effects posted) prior to entering the function, but the
> implementation is free to evaluate the arguments in any order, and
> presumably side-effects can be used to determine this ordering for
> a particular implementation.
> Question: although it may not be required by the language standard,
> can one _practically_ depend upon a compiler finishing one argument
> completely before starting to evaluate another?
No, for the above special cases.
>For example, if the
> compiler evaluates f(a), can one be relatively certain that sqr(f(a))
> will then be evaluated before g(b) commences? In other words, would
> there ever be a reason for evaluating this expression in anything
> other than in some depth-first ordering?
Yes, for the above special cases. Also note for example Fortran doesn't
give any preference for left->right evaluation, and it seems the same is
true for C++, so that if b is already in cache or register there is an
optimization reason for evaluating g(b) before evaluating f(a) for most
single processor machines.
> The question is actually a whole range of questions, depending upon
> the datatypes involved.
> 1. The datatypes are all floating point 'double'. If f(a) and g(b)
> are inline and simple, I could conceive of reordering the expression
> in a more-or-less breadth-first ordering in order to optimize some
> parallel or pipelined FPU.
This could also happen for a simple processor if a and b are in register
> 2. The datatypes are all user-defined and very expensive (think of
> 100x100 matrices of doubles). Unless we are running on a machine with
> multiple processors, and the compiler is smart enough to figure out
> that sqr(), f(), and g() are all independent, I would be amazed if a
> compiler did anything other than depth-first order for this
But for PURE and ELEMENTAL subprograms on a multiple processor machine
those are valid optimizations. For the special case of doubles, sqr and
plus are pipelined operations on many processors and Fortran 90 is, for
instance, allowed to take advantage of this if say f() and g() yield
> 3. An additional wrinkle: the result type of sqr() is different from
> the argument types of plus(), so the compiler must insert a conversion.
> 3a. Think of sqr: int -> int, and plus: double x double -> double.
> In this case, a compiler might conceivably hold off on the conversion
> from int to double from the first argument until the second argument
> produces its (int) result. Then it might convert both ints to doubles
> and finish the plus function.
> 3b. Same question, but f(), g(), and sqr() operate on 100x100
> matrices of ints, and plus() operates on 100x100 matrices of doubles.
> The conversion is now a user-defined conversion. Would the compiler
> ever _not_ completely finish the evaluation of either conv(sqr(f(a)))
> or conv(sqr(g(b))) before even starting to embark on evaluating the
> other? If not, why not?
In Fortran 95 ELEMENTAL subrograms are vectorizable. You can define the
conversion (actually this is intrinsic in F90 for int arrays to double
arrays, but similar capabilities exist for user defined types in F95),
from a scalar to another scalar with no side effects, give the conversion
the ELEMENTAL attribute, and it is automatically defined for an array to
an array of the same shape as a one to one mapping, which is what I
suspect you mean by the above. Similarly the plus operation could be an
elemental operation. In such a case the processor could evaluate
plus(sqr(f(a(1,1)) + g(b(1,1)))) before evaluating plus(sqr(f(a(2,1)) +
g(b(2,1)))), etc. This is particularly likly on a mutiprocessor system,
but could occur on some pipelined processors.
> The reason for the question has to do with (among other things) the
> ordering of allocation/deallocation of the generated temporaries in
> the user-defined datatype situation. Would this ordering ever be
> other than LIFO? If not, why not, and under what conditions?
Certinly on multiprocessors they need not be LIFO, as a temporary
generated for one set of operations on one processor need not have the
same lifetime as a temporary generated for another set of operations on
another processor. On other machines I suspect it depends on what you
mean by allocation/deallocation. The original poster may not be
interested in the following as they are probably not of interest to
garbage collection/memory management.
(For naive readers, if any in this newsgroup, temporaries that appear
explicitly in the users code may be optimized away and handled in
register only, with no implemented allocation/deallocation. This may
occur in F90 for user defined temporary arrays, which in most other
languages would be considered a user-defined datatype situation. For
example, in non-ideal F90 code,
REAL :: A(:), B(:)
REAL ALLOCATABLE, DIMENSION(:) :: DIFF, SUM
REAL :: AVDIF(SIZE(A))
DIFF = A - B
SUM = A + B
AVDIFF = DIFF/SUM
END FUNCTION AVDIFF
The allocations and deallocations will be optimized away by some
processors, e.g., Thinking Machine's. Further, order of generation of
temporaries need not be the same as the order of appearance in the code,
depending on the degree of optimization allowed by the processor. The
processor could also use space allocated for one variable as space for
another variable once the space is no longer needed, again eliminating
a deallocation/allocation stage.)
Return to the
Search the comp.compilers archives again.