Side effects in expressions

"Tzvetan Mikov" <mikov@usa.net>
18 Aug 1999 03:40:34 -0400

          From comp.compilers

Related articles
Side effects in expressions mikov@usa.net (Tzvetan Mikov) (1999-08-18)
Re: Side effects in expressions johnmce@world.std.com (1999-08-18)
Re: Side effects in expressions johnmce@world.std.com (1999-08-21)
| List of all articles for this month |

From: "Tzvetan Mikov" <mikov@usa.net>
Newsgroups: comp.compilers
Date: 18 Aug 1999 03:40:34 -0400
Organization: Posted via Supernews, http://www.supernews.com
Keywords: optimize, comment

Hi,


I have been reading Robert Morgan's book "Building an Optimizing
Compiler". I like it a lot - in fact it is one of the best books on
the subject that I have met.


Among other things he describes a method for code generation in which
each expression is always calculated into the same temporary
value. The expressions are regarded as pure functions of their
arguments. This has some obvious advantages, but it is still not clear
are to me how expressions with side effects must be handled. I hope of
some of "comp.compilers" readers can clear the matter for me.


Here is a simple example of code generation, before we delve into the
actual problem. Compiling the following piece of C code (all variables
are locals):


        a = (b + c) * (b + c)


gives us:


        add T1 = b, c
        add T1 = b, c
        mul T2 = T1, T1
        mov a = T2


(I hope the convention for the generated code is clear). The following
temporaries are always used to calculate the expressions:


T1 = b + c
T2 = T1 * T1 = (b+c)*(b+c)
So, this expression is calculated correctly and we can even easily apply a
very simple common sub-expression elimination. What happens however if we
have sub-expressions with side effects? Examine this C code:
        int * p = ...;
        x = *(p = p + 1) + *(p = p + 1);


The temporary table has these entries:
        T1 = p + sizeof(int)
        T2 = value(*T1)
        T3 = T2 + T2
and the actual code is:
        add T1 = p, sizeof(int)
        mov p = T1
        load T2 = [T1]
        add T1 = p, sizeof(int)
        mov p = T1
        load T2 = [T1]
        add T3 = T2,T2
        mov x = T3
Of course, later it will be optimized to:
        add p = p, 2*sizeof(int)
        load T2=[p]
        shl x = T2,1


I realize that with regard to ANSI C dynamic semantics this is a correct
result, but it doesn't seem very logical. A similar problem appears when
calling functions:
        a = f() + f();
The temporary table is:
T1 = f()
T2 = T1 + T1
And the code will be:
        call T1 = f()
        call T1 = f()
        add T2 = T1, T1
        mov a = T2


This is definitely incorrect. It seems that the convention of using
the same temporaries for the same expressions must be honored only if
the expressions do not include any side effects (assignments or
function calls). This will definitely complicate the simple method of
code generation that is recommended in the book and it wasn't
mentioned at all. I assume that there is an error in my
interpretation, but I can't find it. Any help will be appreciated.


regards,
Tzvetan
[Welcome to the wonderful world of Fortran, which defines its
expressions such that it can combine any two subexpressions that are
lexically the same. Fortran functions are supposed to be true
functions, with the results depending only on the arguments. If you
have Fortran procedures with internal state, code them as subroutines.
In languages other than Fortran, you correctly note that it's wrong to
combine expressions that might have different side effects. -John]


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.