Re: MMX/3Dnow!/SSE/SSE2 compilers (John Dallman)
23 May 2002 01:46:10 -0400

          From comp.compilers

Related articles
[7 earlier articles]
Re: MMX/3Dnow!/SSE/SSE2 compilers (Andrew Richards) (2002-05-01)
Re: MMX/3Dnow!/SSE/SSE2 compilers (Christian Parpart) (2002-05-03)
Re: MMX/3Dnow!/SSE/SSE2 compilers (Marco van de Voort) (2002-05-04)
Re: MMX/3Dnow!/SSE/SSE2 compilers (Andrew Richards) (2002-05-08)
Re: MMX/3Dnow!/SSE/SSE2 compilers (Allan Sandfeld Jensen) (2002-05-12)
Re: MMX/3Dnow!/SSE/SSE2 compilers (jacob navia) (2002-05-23)
Re: MMX/3Dnow!/SSE/SSE2 compilers (2002-05-23)
Re: MMX/3Dnow!/SSE/SSE2 compilers (2002-05-23)
Re: MMX/3Dnow!/SSE/SSE2 compilers (jacob navia) (2002-05-27)
| List of all articles for this month |

From: (John Dallman)
Newsgroups: comp.compilers
Date: 23 May 2002 01:46:10 -0400
Organization: Nextra UK
References: 02-05-004
Keywords: arithmetic, architecture
Posted-Date: 23 May 2002 01:46:10 EDT (jacob navia) wrote:

> > matrix crunches, I get about 30% better throughput than the generic
> > x86 build, which is compiled with MS VC++v6, but the Intel-compiled
> > DLL is about 50% bigger. Thus performance figure is definitely an
> > average - nothing gets significantly slower for me, but some
> > operations have up to 60% better throughput.
> The problem I see with this is that the results from SSE2 are
> different from the results the FPU obtains. Maybe you get a
> performance increase but tell me:
> 1) How did you solve the incompatibility between FPU and SSE2?

If you run the x87 registers in 64-bit mode (ie, 53-bit mantissa, or
IEEE double, as opposed to 80-bit extended precision) the results do
seem to be compatible with the SSE2 registers. Since the code I work
on is intended to produce highly compatible results across many
architectures, we used 64-bit mode anyway.

> 2) How do you maintain data in SSE2 registers across calls?
> Do you save all the SSE2 registers?

I let the compiler do it, and it manages fine. It doesn't appear to
use SSE2 registers for argument passing, although I haven't looked
hard for that. Given that the Intel compiler claims to produce object
files that can be mixed freely with MSVC6 object files, I don't see
how it can do other than treat SSE2 registers as scratch registers
that aren't maintained across function calls. This sits well with the
(apparent) practice of the MS compiler of making sure that the x87
register stack is empty when it's completed each basic block.

> I implemented all floating point in SSE2 in my compiler system
> (lcc-win32) but failed at the above points. I could not
> guarantee that the results of a+b would be the same and that
> would have lead to incredible problems with floating point code
> since I set up the FPU to use all precision.

Well, yes. You'll hit the same problems with the Intel compiler
if you use the x87 registers at their maximum precision.
John Dallman
                    "C++ - the FORTRAN of the early 21st century."

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.