|[9 earlier articles]|
|Re: MMX/3Dnow!/SSE/SSE2 compilers email@example.com (Marco van de Voort) (2002-05-04)|
|Re: MMX/3Dnow!/SSE/SSE2 compilers firstname.lastname@example.org (Andrew Richards) (2002-05-08)|
|Re: MMX/3Dnow!/SSE/SSE2 compilers email@example.com (Allan Sandfeld Jensen) (2002-05-12)|
|Re: MMX/3Dnow!/SSE/SSE2 compilers firstname.lastname@example.org (jacob navia) (2002-05-23)|
|Re: MMX/3Dnow!/SSE/SSE2 compilers email@example.com (2002-05-23)|
|Re: MMX/3Dnow!/SSE/SSE2 compilers firstname.lastname@example.org (2002-05-23)|
|Re: MMX/3Dnow!/SSE/SSE2 compilers email@example.com (jacob navia) (2002-05-27)|
|From:||"jacob navia" <firstname.lastname@example.org>|
|Date:||27 May 2002 01:15:15 -0400|
|Organization:||Wanadoo, l'internet avec France Telecom|
|References:||02-04-126 02-04-137 02-04-146 02-04-157 02-05-051 02-05-123|
|Posted-Date:||27 May 2002 01:14:52 EDT|
> The auto-vectorization process is still a bit "enigmatic" for me. I
> haven't tried with gcc 3.1 yet. But with Intel's icc 6.0 (linux),
> I've tried to compile the following source code in many ways, but I've
> always got the same result on a Pentium 4 2.0Ghz 512K with 1Gb of RAM.
> Nevertherless, icc tells me that both loops was vectorized ?!?!
> icc test.c -o test --> ./test : 0.56s
> with -xM (MMX) --> : 0.55s
> with -xW (SSE2) --> : 0.56s
> with -O2 --> : 0.56s
> with -O2 -xW --> : 0.55s
> I've also tried with our software (a raytracer) which uses many
> floating-point operations, there is neither any gain.
I got the SAME results!!!
After MUCH work, the gains are not measurable. The SSE2 registers have too
many problems. For instance there is no move immediate, so to put 2 in a
SSE2 register I have to:
To increment a sse2 register is at least 3 instructions:
The code bloat here is enormous: inc eax is 1 byte, the above code is 9
Return to the
Search the comp.compilers archives again.