Re: MMX/3Dnow!/SSE/SSE2 compilers

Andrew Richards <>
24 Apr 2002 22:25:43 -0400

          From comp.compilers

Related articles
MMX/3Dnow!/SSE/SSE2 compilers (Curtis and Disa) (2002-04-21)
Re: MMX/3Dnow!/SSE/SSE2 compilers (Andrew Richards) (2002-04-23)
Re: MMX/3Dnow!/SSE/SSE2 compilers (Andrew Richards) (2002-04-24)
Re: MMX/3Dnow!/SSE/SSE2 compilers (Allan Sandfeld Jensen) (2002-04-24)
Re: MMX/3Dnow!/SSE/SSE2 compilers (2002-04-29)
Re: MMX/3Dnow!/SSE/SSE2 compilers (2002-04-29)
Re: MMX/3Dnow!/SSE/SSE2 compilers (jacob navia) (2002-05-01)
Re: MMX/3Dnow!/SSE/SSE2 compilers (Andrew Richards) (2002-05-01)
Re: MMX/3Dnow!/SSE/SSE2 compilers (Christian Parpart) (2002-05-03)
[7 later articles]
| List of all articles for this month |

From: Andrew Richards <>
Newsgroups: comp.compilers
Date: 24 Apr 2002 22:25:43 -0400
Organization: Compilers Central
References: 02-04-126 02-04-137
Keywords: architecture, performance
Posted-Date: 24 Apr 2002 22:25:43 EDT

Andrew Richards wrote:
> Hi,
> Curtis and Disa wrote:
>>What compilers support any of the MMX/3Dnow!/SSE/SSE2 instruction sets
>>(and optimize code for them)? Do you know of any published
>>comparisons of such compilers?
> Ours do! :-) We have some comparisons, but they are mainly based on
> image and sound processing. We sell to game developers, mostly.
> [I was under the impression that people usually use these vectorish
> instructions via libraries more than via in-line code. -John]

If you're a game developer or writing graphics or sound processing
software, then most of your coding will be using vectors of some kind.
Also, programmers doing this kind of work always like to do something
new and different (because we are in the business of entertainment),
so you tend to get lots of developers writing vector-processing code
that needs very high levels of performance.

If your library routine is just "rotate point", and the processor
architecture can do this operation in about 8 cycles, then the
overhead of calling the routine is much larger than the cost of the
operation. If you want to process 20 million points per second (not a
particularly high figure these days) and your processor runs at a few
hundred megahertz, then you only have the order of 10-50 cycles per
point (ignoring any other processing which you may want to do). That's
very low when you consider that points need to be (in the simplest
case) rotated, projected and lit. For a function that needs to pass
and return vectors, you will need at least 10 cycles for just
parameter passing and returning.

You also want to have a high level of maintainability to entertainment
code. People will say "that doesn't look quite right, what if you
changed the lighting calculation there?". Very hard if you have lots
of assembly code, especially if you are supporting multiple platforms.

Andrew Richards

Tel: +44 (0)20 7482 3382
140-142 Kentish Town Rd, London, NW1 9QB

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.