From: | bcombee@metrowerks.com (Ben Combee) |
Newsgroups: | comp.compilers |
Date: | 19 Jan 2000 01:06:40 -0500 |
Organization: | Metrowerks |
References: | 00-01-017 00-01-031 |
Keywords: | code, optimize |
Greg Lindahl wrote:
> You also didn't mention the alignment restrictions. Isn't it the case
> that the inputs need to be 128-bit aligned? So if you don't know the
> alignment at compile-time, or the alignment happens to be unfortunate,
> you're out of luck. Consider:
>
> short int a1[50], a2[50], b[50], c[50];
>
> void foo(void)
> {
> int i;
> for (i = 0; i < 50; i++) a1[i] = b[i] + c[i];
> for (i = 0; i < 49; i++) a2[i] = b[i+1] + c[i];
> }
>
> In this example, I don't think you can vectorize both loops.
Intel's SSE instruction set requires 128-bit alignment for its vector
operations, but neither 3DNow! or MMX have that restriction. You are
correct that we will need pointers to guarenteed 128-bit aligned types
for implementing SSE vectorization.
For your test listed above, I saw vector code generated for both loops
using the CodeWarrior Pro 5.3 compiler. However, for the second loop,
there were some rather odd sequences generated in the support code
outside the loop itself that point out some issues with the current
quality of peephole optimization in CW.
--
Ben Combee <bcombee@metrowerks.com> -- x86/Win32/Linux/NetWare CompilerWarrior
Return to the
comp.compilers page.
Search the
comp.compilers archives again.