Re: MMX/3Dnow!/SSE/SSE2 compilers

salbin@emse.fr (Stephane Albin)
23 May 2002 01:49:38 -0400

          From comp.compilers

Related articles
[8 earlier articles]
Re: MMX/3Dnow!/SSE/SSE2 compilers cparpart@surakware.net (Christian Parpart) (2002-05-03)
Re: MMX/3Dnow!/SSE/SSE2 compilers marcov@toad.stack.nl (Marco van de Voort) (2002-05-04)
Re: MMX/3Dnow!/SSE/SSE2 compilers a.richards@codeplay.com (Andrew Richards) (2002-05-08)
Re: MMX/3Dnow!/SSE/SSE2 compilers snowwolf@diku.dk (Allan Sandfeld Jensen) (2002-05-12)
Re: MMX/3Dnow!/SSE/SSE2 compilers jacob@jacob.remcomp.fr (jacob navia) (2002-05-23)
Re: MMX/3Dnow!/SSE/SSE2 compilers jgd@cix.co.uk (2002-05-23)
Re: MMX/3Dnow!/SSE/SSE2 compilers salbin@emse.fr (2002-05-23)
Re: MMX/3Dnow!/SSE/SSE2 compilers jacob@jacob.remcomp.fr (jacob navia) (2002-05-27)
| List of all articles for this month |
From: salbin@emse.fr (Stephane Albin)
Newsgroups: comp.compilers
Date: 23 May 2002 01:49:38 -0400
Organization: http://groups.google.com/
References: 02-04-126 02-04-137 02-04-146 02-04-157 02-05-051
Keywords: arithmetic, optimize
Posted-Date: 23 May 2002 01:49:38 EDT

Allan Sandfeld Jensen <snowwolf@diku.dk> wrote in message news:02-05-051...
> dave@icmfp.com wrote:
>
> >> The upcomming GCC 3.1 (The extensions was developed for x86-64 by Suse
> >> and AMD) and of course Intel's icc
> >
> > Is there any documentation on what GCC will be able to do in this
> > respect? I have heard many reports of automatic vectorization
> > support, and have used some (very old) patches which added automatic
> > use of MMX instructions and registers, but the performance increase
> > was minimal due to the poor code generated. Have things improved a
> > lot since then?
>
> Well, the biggest gain and the focus of the project was the option
> -fpmath=sse. This replaces all x87 instructions with SSE/SSE2 ones. This
> gives a performance boost even without vectorization, because it makes
> compiler optimizations a lot easier (more RISC like).
>
> I wouldnt trust gcc to vectorize anything though.




The auto-vectorization process is still a bit "enigmatic" for me. I
haven't tried with gcc 3.1 yet. But with Intel's icc 6.0 (linux),
I've tried to compile the following source code in many ways, but I've
always got the same result on a Pentium 4 2.0Ghz 512K with 1Gb of RAM.
Nevertherless, icc tells me that both loops was vectorized ?!?!


I think that MMX/SSE... code still must be developped by hand.




/* test.c */
#include <stdlib.h>
#include <stdio.h>
#include <time.h>


#define END 10000
#define NB (END*END)


#define ZERO_INT 0
#define ZERO_FLT 0.0F
#define ZERO_DBL 0.0
typedef float type_val;


int main(void)
{
    clock_t t1, t2, diff;
    size_t i, j;
    type_val *array;


    array = malloc(NB * sizeof *array);
    if (array == NULL)
    {
        (void) fprintf(stderr, "No memory\n");
        exit(EXIT_FAILURE);
    }


    /* Initialisation avoiding effects of cache */
    for (i=0; i<END; i++)
        for (j=0; j<END; j++)
            array[i*END+j] = i+j;


    t1 = clock();
    for (i=0; i<END; i++)
        for (j=0; j<END; j++)
            array[i*END+j] = ZERO;
    t2 = clock();


    diff = (t2-t1);
    (void) fprintf(stdout, "Time : %g seconds\n", (double)
diff/CLOCKS_PER_SEC);
    free(array);
    return 0;
}






icc test.c -o test --> ./test : 0.56s
with -xM (MMX) --> : 0.55s
with -xW (SSE2) --> : 0.56s
with -O2 --> : 0.56s
with -O2 -xW --> : 0.55s


I've also tried with our software (a raytracer) which uses many
floating-point operations, there is neither any gain.


Stephane.
--
                                Stéphane ALBIN
Laboratoire d'Images de Synthèse de St-Etienne
Centre SIMMO - Ecole des Mines de Saint-Etienne
Tel: (33) 4 77 42 01 78 - Fax: (33) 4 77 42 66 66
e-mail: Stephane.Albin@emse.fr - www: http://www.emse.fr/~salbin/


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.