|[7 earlier articles]|
|Re: MMX/3Dnow!/SSE/SSE2 compilers email@example.com (Andrew Richards) (2002-05-01)|
|Re: MMX/3Dnow!/SSE/SSE2 compilers firstname.lastname@example.org (Christian Parpart) (2002-05-03)|
|Re: MMX/3Dnow!/SSE/SSE2 compilers email@example.com (Marco van de Voort) (2002-05-04)|
|Re: MMX/3Dnow!/SSE/SSE2 compilers firstname.lastname@example.org (Andrew Richards) (2002-05-08)|
|Re: MMX/3Dnow!/SSE/SSE2 compilers email@example.com (Allan Sandfeld Jensen) (2002-05-12)|
|Re: MMX/3Dnow!/SSE/SSE2 compilers firstname.lastname@example.org (jacob navia) (2002-05-23)|
|Re: MMX/3Dnow!/SSE/SSE2 compilers email@example.com (2002-05-23)|
|Re: MMX/3Dnow!/SSE/SSE2 compilers firstname.lastname@example.org (2002-05-23)|
|Re: MMX/3Dnow!/SSE/SSE2 compilers email@example.com (jacob navia) (2002-05-27)|
|From:||firstname.lastname@example.org (John Dallman)|
|Date:||23 May 2002 01:46:10 -0400|
|Posted-Date:||23 May 2002 01:46:10 EDT|
email@example.com (jacob navia) wrote:
> > matrix crunches, I get about 30% better throughput than the generic
> > x86 build, which is compiled with MS VC++v6, but the Intel-compiled
> > DLL is about 50% bigger. Thus performance figure is definitely an
> > average - nothing gets significantly slower for me, but some
> > operations have up to 60% better throughput.
> The problem I see with this is that the results from SSE2 are
> different from the results the FPU obtains. Maybe you get a
> performance increase but tell me:
> 1) How did you solve the incompatibility between FPU and SSE2?
If you run the x87 registers in 64-bit mode (ie, 53-bit mantissa, or
IEEE double, as opposed to 80-bit extended precision) the results do
seem to be compatible with the SSE2 registers. Since the code I work
on is intended to produce highly compatible results across many
architectures, we used 64-bit mode anyway.
> 2) How do you maintain data in SSE2 registers across calls?
> Do you save all the SSE2 registers?
I let the compiler do it, and it manages fine. It doesn't appear to
use SSE2 registers for argument passing, although I haven't looked
hard for that. Given that the Intel compiler claims to produce object
files that can be mixed freely with MSVC6 object files, I don't see
how it can do other than treat SSE2 registers as scratch registers
that aren't maintained across function calls. This sits well with the
(apparent) practice of the MS compiler of making sure that the x87
register stack is empty when it's completed each basic block.
> I implemented all floating point in SSE2 in my compiler system
> (lcc-win32) but failed at the above points. I could not
> guarantee that the results of a+b would be the same and that
> would have lead to incredible problems with floating point code
> since I set up the FPU to use all precision.
Well, yes. You'll hit the same problems with the Intel compiler
if you use the x87 registers at their maximum precision.
John Dallman firstname.lastname@example.org
"C++ - the FORTRAN of the early 21st century."
Return to the
Search the comp.compilers archives again.