Re: MMX/3Dnow!/SSE/SSE2 compilers (Stephane Albin)
23 May 2002 01:49:38 -0400

          From comp.compilers

Related articles
[8 earlier articles]
Re: MMX/3Dnow!/SSE/SSE2 compilers (Christian Parpart) (2002-05-03)
Re: MMX/3Dnow!/SSE/SSE2 compilers (Marco van de Voort) (2002-05-04)
Re: MMX/3Dnow!/SSE/SSE2 compilers (Andrew Richards) (2002-05-08)
Re: MMX/3Dnow!/SSE/SSE2 compilers (Allan Sandfeld Jensen) (2002-05-12)
Re: MMX/3Dnow!/SSE/SSE2 compilers (jacob navia) (2002-05-23)
Re: MMX/3Dnow!/SSE/SSE2 compilers (2002-05-23)
Re: MMX/3Dnow!/SSE/SSE2 compilers (2002-05-23)
Re: MMX/3Dnow!/SSE/SSE2 compilers (jacob navia) (2002-05-27)
| List of all articles for this month |

From: (Stephane Albin)
Newsgroups: comp.compilers
Date: 23 May 2002 01:49:38 -0400
References: 02-04-126 02-04-137 02-04-146 02-04-157 02-05-051
Keywords: arithmetic, optimize
Posted-Date: 23 May 2002 01:49:38 EDT

Allan Sandfeld Jensen <> wrote in message news:02-05-051...
> wrote:
> >> The upcomming GCC 3.1 (The extensions was developed for x86-64 by Suse
> >> and AMD) and of course Intel's icc
> >
> > Is there any documentation on what GCC will be able to do in this
> > respect? I have heard many reports of automatic vectorization
> > support, and have used some (very old) patches which added automatic
> > use of MMX instructions and registers, but the performance increase
> > was minimal due to the poor code generated. Have things improved a
> > lot since then?
> Well, the biggest gain and the focus of the project was the option
> -fpmath=sse. This replaces all x87 instructions with SSE/SSE2 ones. This
> gives a performance boost even without vectorization, because it makes
> compiler optimizations a lot easier (more RISC like).
> I wouldnt trust gcc to vectorize anything though.

The auto-vectorization process is still a bit "enigmatic" for me. I
haven't tried with gcc 3.1 yet. But with Intel's icc 6.0 (linux),
I've tried to compile the following source code in many ways, but I've
always got the same result on a Pentium 4 2.0Ghz 512K with 1Gb of RAM.
Nevertherless, icc tells me that both loops was vectorized ?!?!

I think that MMX/SSE... code still must be developped by hand.

/* test.c */
#include <stdlib.h>
#include <stdio.h>
#include <time.h>

#define END 10000
#define NB (END*END)

#define ZERO_INT 0
#define ZERO_FLT 0.0F
#define ZERO_DBL 0.0
typedef float type_val;

int main(void)
    clock_t t1, t2, diff;
    size_t i, j;
    type_val *array;

    array = malloc(NB * sizeof *array);
    if (array == NULL)
        (void) fprintf(stderr, "No memory\n");

    /* Initialisation avoiding effects of cache */
    for (i=0; i<END; i++)
        for (j=0; j<END; j++)
            array[i*END+j] = i+j;

    t1 = clock();
    for (i=0; i<END; i++)
        for (j=0; j<END; j++)
            array[i*END+j] = ZERO;
    t2 = clock();

    diff = (t2-t1);
    (void) fprintf(stdout, "Time : %g seconds\n", (double)
    return 0;

icc test.c -o test --> ./test : 0.56s
with -xM (MMX) --> : 0.55s
with -xW (SSE2) --> : 0.56s
with -O2 --> : 0.56s
with -O2 -xW --> : 0.55s

I've also tried with our software (a raytracer) which uses many
floating-point operations, there is neither any gain.

                                Stéphane ALBIN
Laboratoire d'Images de Synthèse de St-Etienne
Centre SIMMO - Ecole des Mines de Saint-Etienne
Tel: (33) 4 77 42 01 78 - Fax: (33) 4 77 42 66 66
e-mail: - www:

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.