Related articles |
---|
[8 earlier articles] |
Re: MMX/3Dnow!/SSE/SSE2 compilers cparpart@surakware.net (Christian Parpart) (2002-05-03) |
Re: MMX/3Dnow!/SSE/SSE2 compilers marcov@toad.stack.nl (Marco van de Voort) (2002-05-04) |
Re: MMX/3Dnow!/SSE/SSE2 compilers a.richards@codeplay.com (Andrew Richards) (2002-05-08) |
Re: MMX/3Dnow!/SSE/SSE2 compilers snowwolf@diku.dk (Allan Sandfeld Jensen) (2002-05-12) |
Re: MMX/3Dnow!/SSE/SSE2 compilers jacob@jacob.remcomp.fr (jacob navia) (2002-05-23) |
Re: MMX/3Dnow!/SSE/SSE2 compilers jgd@cix.co.uk (2002-05-23) |
Re: MMX/3Dnow!/SSE/SSE2 compilers salbin@emse.fr (2002-05-23) |
Re: MMX/3Dnow!/SSE/SSE2 compilers jacob@jacob.remcomp.fr (jacob navia) (2002-05-27) |
From: | salbin@emse.fr (Stephane Albin) |
Newsgroups: | comp.compilers |
Date: | 23 May 2002 01:49:38 -0400 |
Organization: | http://groups.google.com/ |
References: | 02-04-126 02-04-137 02-04-146 02-04-157 02-05-051 |
Keywords: | arithmetic, optimize |
Posted-Date: | 23 May 2002 01:49:38 EDT |
Allan Sandfeld Jensen <snowwolf@diku.dk> wrote in message news:02-05-051...
> dave@icmfp.com wrote:
>
> >> The upcomming GCC 3.1 (The extensions was developed for x86-64 by Suse
> >> and AMD) and of course Intel's icc
> >
> > Is there any documentation on what GCC will be able to do in this
> > respect? I have heard many reports of automatic vectorization
> > support, and have used some (very old) patches which added automatic
> > use of MMX instructions and registers, but the performance increase
> > was minimal due to the poor code generated. Have things improved a
> > lot since then?
>
> Well, the biggest gain and the focus of the project was the option
> -fpmath=sse. This replaces all x87 instructions with SSE/SSE2 ones. This
> gives a performance boost even without vectorization, because it makes
> compiler optimizations a lot easier (more RISC like).
>
> I wouldnt trust gcc to vectorize anything though.
The auto-vectorization process is still a bit "enigmatic" for me. I
haven't tried with gcc 3.1 yet. But with Intel's icc 6.0 (linux),
I've tried to compile the following source code in many ways, but I've
always got the same result on a Pentium 4 2.0Ghz 512K with 1Gb of RAM.
Nevertherless, icc tells me that both loops was vectorized ?!?!
I think that MMX/SSE... code still must be developped by hand.
/* test.c */
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#define END 10000
#define NB (END*END)
#define ZERO_INT 0
#define ZERO_FLT 0.0F
#define ZERO_DBL 0.0
typedef float type_val;
int main(void)
{
clock_t t1, t2, diff;
size_t i, j;
type_val *array;
array = malloc(NB * sizeof *array);
if (array == NULL)
{
(void) fprintf(stderr, "No memory\n");
exit(EXIT_FAILURE);
}
/* Initialisation avoiding effects of cache */
for (i=0; i<END; i++)
for (j=0; j<END; j++)
array[i*END+j] = i+j;
t1 = clock();
for (i=0; i<END; i++)
for (j=0; j<END; j++)
array[i*END+j] = ZERO;
t2 = clock();
diff = (t2-t1);
(void) fprintf(stdout, "Time : %g seconds\n", (double)
diff/CLOCKS_PER_SEC);
free(array);
return 0;
}
icc test.c -o test --> ./test : 0.56s
with -xM (MMX) --> : 0.55s
with -xW (SSE2) --> : 0.56s
with -O2 --> : 0.56s
with -O2 -xW --> : 0.55s
I've also tried with our software (a raytracer) which uses many
floating-point operations, there is neither any gain.
Stephane.
--
Stéphane ALBIN
Laboratoire d'Images de Synthèse de St-Etienne
Centre SIMMO - Ecole des Mines de Saint-Etienne
Tel: (33) 4 77 42 01 78 - Fax: (33) 4 77 42 66 66
e-mail: Stephane.Albin@emse.fr - www: http://www.emse.fr/~salbin/
Return to the
comp.compilers page.
Search the
comp.compilers archives again.