Re: MMX/3Dnow!/SSE/SSE2 compilers

"jacob navia" <jacob@jacob.remcomp.fr>
27 May 2002 01:15:15 -0400

          From comp.compilers

Related articles
[9 earlier articles]
Re: MMX/3Dnow!/SSE/SSE2 compilers marcov@toad.stack.nl (Marco van de Voort) (2002-05-04)
Re: MMX/3Dnow!/SSE/SSE2 compilers a.richards@codeplay.com (Andrew Richards) (2002-05-08)
Re: MMX/3Dnow!/SSE/SSE2 compilers snowwolf@diku.dk (Allan Sandfeld Jensen) (2002-05-12)
Re: MMX/3Dnow!/SSE/SSE2 compilers jacob@jacob.remcomp.fr (jacob navia) (2002-05-23)
Re: MMX/3Dnow!/SSE/SSE2 compilers jgd@cix.co.uk (2002-05-23)
Re: MMX/3Dnow!/SSE/SSE2 compilers salbin@emse.fr (2002-05-23)
Re: MMX/3Dnow!/SSE/SSE2 compilers jacob@jacob.remcomp.fr (jacob navia) (2002-05-27)
| List of all articles for this month |

From: "jacob navia" <jacob@jacob.remcomp.fr>
Newsgroups: comp.compilers
Date: 27 May 2002 01:15:15 -0400
Organization: Wanadoo, l'internet avec France Telecom
References: 02-04-126 02-04-137 02-04-146 02-04-157 02-05-051 02-05-123
Keywords: arithmetic, optimize
Posted-Date: 27 May 2002 01:14:52 EDT

> The auto-vectorization process is still a bit "enigmatic" for me. I
> haven't tried with gcc 3.1 yet. But with Intel's icc 6.0 (linux),
> I've tried to compile the following source code in many ways, but I've
> always got the same result on a Pentium 4 2.0Ghz 512K with 1Gb of RAM.
> Nevertherless, icc tells me that both loops was vectorized ?!?!
>
[snip]


> icc test.c -o test --> ./test : 0.56s
> with -xM (MMX) --> : 0.55s
> with -xW (SSE2) --> : 0.56s
> with -O2 --> : 0.56s
> with -O2 -xW --> : 0.55s
>
> I've also tried with our software (a raytracer) which uses many
> floating-point operations, there is neither any gain.


I got the SAME results!!!
After MUCH work, the gains are not measurable. The SSE2 registers have too
many problems. For instance there is no move immediate, so to put 2 in a
SSE2 register I have to:
        movl $2,%eax
        movd %eax,%xmm0


To increment a sse2 register is at least 3 instructions:
        movd %xmm0,%eax
        inc %eax
        movd %eax,%xmm0


The code bloat here is enormous: inc eax is 1 byte, the above code is 9
bytes!!!


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.