Related articles |
---|
compilers using MMX instructions in the generated code ramkik@sasi.com (Ramkishor) (2000-01-06) |
Re: compilers using MMX instructions in the generated code bcombee@metrowerks.com (2000-01-09) |
Re: compilers using MMX instructions in the generated code jkahrs@castor.atlas.de (Juergen Kahrs) (2000-01-09) |
Re: compilers using MMX instructions in the generated code Milind.Girkar@intel.com (Milind Girkar) (2000-01-09) |
Re: compilers using MMX instructions in the generated code plakal@cs.wisc.edu (2000-01-09) |
Re: compilers using MMX instructions in the generated code lindahl@pbm.com (2000-01-12) |
Re: compilers using MMX instructions in the generated code olefevre@my-deja.com (2000-01-12) |
Re: compilers using MMX instructions in the generated code mlross@jf.intel.com (2000-01-12) |
[6 later articles] |
From: | bcombee@metrowerks.com (Ben Combee) |
Newsgroups: | comp.compilers |
Date: | 9 Jan 2000 22:47:47 -0500 |
Organization: | Metrowerks |
References: | 00-01-011 |
Keywords: | code |
Ramkishor wrote:
> Are there any compilers, which can use MMX instructions(Any SIMD
> instructions like 3DNow from AMD or VIS from SUN etc.) in the code
> generated by them?
The CodeWarrior x86 compiler has used MMX and 3DNow! instructions for
doing vector computations since the Pro 3 release. It cannot always
detect when they are legal and useful, but for simple loops where the
compiler can determine there is no aliasing between array references,
it will use the wider SIMD instructions.
For example, the code
short int a[50], b[50], c[50];
void foo(void)
{
int i;
for (i = 0; i < 50; i++) a[i] = b[i] + c[i];
}
would be vectorized using the MMX PADDW instruction to add 4 16-bit
integers at a time for 12 iterations, followed by standard code to
handle the two left over additions.
Here is a disassembly of this function compiled with our 2.3.2 release,
optimization level 4, targetting MMX/Pentium II.
name = _foo
offset = 0x00000000; type = 0x0020; class = 0x0002
00000000: 31 D2 xor edx,edx
00000002: 89 D0 mov eax,edx
00000004: D1 E0 sal eax,1h
00000006: 0F 6F 80 00 00 00 00 movq mm0,qword ptr [eax+_b]
0000000D: 83 C2 04 add edx,4
00000010: 0F FD 80 00 00 00 00 paddw mm0,qword ptr [eax+_c]
00000017: 0F 7F 80 00 00 00 00 movq qword ptr [eax+_a],mm0
0000001E: 05 08 00 00 00 add eax,8
00000023: 83 FA 2F cmp edx,47
00000026: 7C DE jl $-32 ; --> 0x0006
00000028: 66 8B 0D 60 00 00 00 mov cx,word ptr _b+96
0000002F: 66 03 0D 60 00 00 00 add cx,word ptr _c+96
00000036: 66 89 0D 60 00 00 00 mov word ptr _a+96,cx
0000003D: 66 A1 62 00 00 00 mov ax,word ptr _b+98
00000043: 66 03 05 62 00 00 00 add ax,word ptr _c+98
0000004A: 83 C2 02 add edx,2
0000004D: 66 A3 62 00 00 00 mov word ptr _a+98,ax
00000053: 0F 77 emms
00000055: C3 ret near
There are limitations... I used global arrays on purpose here -- we do
not yet support the ISO C 1999 keyword "restrict" that would let you
give the compiler enough information to know that the b and c arrays
did not alias with array a, so if you passed the arrays in as
parameters, we would not attempt the vectorization.
--
Ben Combee <bcombee@metrowerks.com> -- x86/Win32/Linux/NetWare CompilerWarrior
Return to the
comp.compilers page.
Search the
comp.compilers archives again.