Re: MMX/3Dnow!/SSE/SSE2 compilers (John Dallman)
29 Apr 2002 01:53:53 -0400

          From comp.compilers

Related articles
MMX/3Dnow!/SSE/SSE2 compilers (Curtis and Disa) (2002-04-21)
Re: MMX/3Dnow!/SSE/SSE2 compilers (Andrew Richards) (2002-04-23)
Re: MMX/3Dnow!/SSE/SSE2 compilers (Andrew Richards) (2002-04-24)
Re: MMX/3Dnow!/SSE/SSE2 compilers (Allan Sandfeld Jensen) (2002-04-24)
Re: MMX/3Dnow!/SSE/SSE2 compilers (2002-04-29)
Re: MMX/3Dnow!/SSE/SSE2 compilers (2002-04-29)
Re: MMX/3Dnow!/SSE/SSE2 compilers (jacob navia) (2002-05-01)
Re: MMX/3Dnow!/SSE/SSE2 compilers (Andrew Richards) (2002-05-01)
Re: MMX/3Dnow!/SSE/SSE2 compilers (Christian Parpart) (2002-05-03)
Re: MMX/3Dnow!/SSE/SSE2 compilers (Marco van de Voort) (2002-05-04)
Re: MMX/3Dnow!/SSE/SSE2 compilers (Andrew Richards) (2002-05-08)
Re: MMX/3Dnow!/SSE/SSE2 compilers (Allan Sandfeld Jensen) (2002-05-12)
[4 later articles]
| List of all articles for this month |

From: (John Dallman)
Newsgroups: comp.compilers
Date: 29 Apr 2002 01:53:53 -0400
Organization: By appointment only
References: 02-04-126
Keywords: code
Posted-Date: 29 Apr 2002 01:53:53 EDT (Curtis and Disa) wrote:

> What compilers support any of the MMX/3Dnow!/SSE/SSE2 instruction sets
> (and optimize code for them)? Do you know of any published
> comparisons of such compilers?

Have an off-the-cuff review:

I have built (and my employers are shipping) a commercial product with
Intel's C/C++ compiler, version 5.0.1, targeting SSE2
unconditionally. The compiler offers a
test-at-run-time-and-select-alternative-code-paths option, but I
didn't want to make it larger, or lose any speed at all.

For a C library that does a lot of floating-point work, but no large
matrix crunches, I get about 30% better throughput than the generic
x86 build, which is compiled with MS VC++v6, but the Intel-compiled
DLL is about 50% bigger. Thus performance figure is definitely an
average - nothing gets significantly slower for me, but some
operations have up to 60% better throughput.

Most of the gain in this case seems to come from using SSE2 registers
and operations for floating point, evading the limits to speculative
execution with the x87 register stack, and also avoiding the problem
that the Pentium 4 is slower than one would expect for some x87
operations. For this set of code, MMX, SSE and 3DNow! were never going
to be useful, since they don't handle doubles, and I never tried them.

Quite a bit of the library's work is with 3-D points, consisting of

typedef struct { double x, y, z; } vector;

Intel's compiler currently doesn't try to turn:

vector a, b; /* initialise them */
vector c;
c.x = a.x + b.x;
c.y = a.y + b.y;
c.x = a.z + b.z;

Into two paired loads, a paired add, a paired store, and the same
unpaired. I believe that it only tries to use paired double-precision
operations (or quadruple single-precision operations) if it's
unrolling a loop.

Intel supply a math library with the 5.0.1 compiler which is
call-compatible with the MS VC++v6 library, but tests the CPU when
it's first called and can do some operations faster in SSE2 than using
the x87 built-in instructions. Since the library I've been working on
doesn't call transcendentals very often, I didn't get much performance
gain out of it, and avoided using it so as not to have to ship the
third-party math library DLL.

The 6.0 compiler come with a math library with SSE2 math functions
directly callable by the compiler, which it insists on using if you're
compiling for SSE2. I haven't adopted the 6.0 compiler yet.

John Dallman
                    "C++ - the FORTRAN of the early 21st century."

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.