Related articles |
---|
how do a RISC compiler translate an array initialization ? Sid-Ahmed-Ali.TOUATI@inria.fr (Sid Ahmed Ali TOUATI) (1999-08-02) |
Re: how do a RISC compiler translate an array initialization ? zalman@netcom15.netcom.com (Zalman Stern) (1999-08-02) |
Re: how do a RISC compiler translate an array initialization ? toon@moene.indiv.nluug.nl (Toon Moene) (1999-08-04) |
Re: how do a RISC compiler translate an array initialization ? pmichaud@irisa.fr (1999-08-04) |
Re: how do a RISC compiler translate an array initialization ? Sid-Ahmed-Ali.TOUATI@inria.fr (Sid Ahmed Ali TOUATI) (1999-08-04) |
Re: how do a RISC compiler translate an array initialization ? Sid-Ahmed-Ali.TOUATI@inria.fr (Sid Ahmed Ali TOUATI) (1999-08-07) |
Re: measuring cached writes, was how do a RISC compiler ... gratz@ite.inf.tu-dresden.de (Achim Gratz) (1999-08-08) |
From: | Sid Ahmed Ali TOUATI <Sid-Ahmed-Ali.TOUATI@inria.fr> |
Newsgroups: | comp.compilers,comp.arch |
Date: | 2 Aug 1999 11:50:29 -0400 |
Organization: | INRIA |
Keywords: | architecture, performance, question |
dear all,
I am playing with hardware performance mechanism of sparc II (and
pentium). These mechanisms are some specific purpose register for
counting some specific events (cache miss, latencies, lost cycles...).
I try to understand memory access behavior of a simple code like:
REAL X(200)
for i=1, 200
x(i)=2 <----- or any constant value
end for
I thought that this is equivalent to access every X element in each
iteration, which yield to some cache miss equal to 50 in an ultra
sparc II. My surprise was that the real number of misses (counted with
specific registers) was about 3 or 4, with different compilers (cc,
gcc), different optimization (any, -O1..O4), different languages
(fortran, C). So, I analyzed the code generated, and it seems that
there is a memory access in each iteration. This does not explain why
cache misses does not occur. How does the compiler translate such
code of array assignment with a constant ? I turned off all
optimizations and the result is the same. So: 1. is there a mechanism
to store a constant value in a large data segment ? the code generated
seems to contain classic stores ("st" instructions). 2. Do the
compiler bypass the data cache when it generate an array
initialisation ? this is possible on some processors like the ultra
sparc.
thank you
Sid Ahmed Ali Touati
Remark: when I replace the instruction with x(i)=x(i)+1, a "relatively"
correct number of cache misses is reported (42 or 43 misses are
reported). It proove that the compilation process is not the same for
the first and the last version of the code despite the fact that I
turned off all optimization options.
[Sounds like the cache lines are bigger than individual words, and these
chips may have a write-behind cache. -John]
Return to the
comp.compilers page.
Search the
comp.compilers archives again.