From: | andi@complang.tuwien.ac.at (Andreas Krall) |
Newsgroups: | comp.compilers |
Date: | 15 Jan 2000 14:18:59 -0500 |
Organization: | Vienna University of Technology, Austria |
References: | 00-01-017 00-01-031 |
Keywords: | architecture, optimize |
lindahl@pbm.com (Greg Lindahl) writes:
> You also didn't mention the alignment restrictions. Isn't it the case
> that the inputs need to be 128-bit aligned? So if you don't know the
> alignment at compile-time, or the alignment happens to be unfortunate,
> you're out of luck. Consider:
>
> short int a1[50], a2[50], b[50], c[50];
>
> void foo(void)
> {
> int i;
> for (i = 0; i < 50; i++) a1[i] = b[i] + c[i];
> for (i = 0; i < 49; i++) a2[i] = b[i+1] + c[i];
> }
>
> In this example, I don't think you can vectorize both loops.
It is possible to vectorize both loops. Our prototype compmiler for
the SPARC VIS can handle this case (with a little bit support from the
hardware). The SPARC has support for unaligned loads where only three
instructions are necessary for an unaligned load (2 loads and a
merge). Similar code can be emitted for processors without support by
shifts and logical or. The prolog and epilog of the loop needs special
handling. Inside the loop it only one load is necessary because the
second value from the previous iteration can be used. So loop peeling
is necessary, afterwards vectorization can be applied.
--
andi@complang.tuwien.ac.at Andreas Krall
http://www.complang.tuwien.ac.at/andi/ Inst. f. Computersprachen, TU Wien
tel: (+431) 58801/18511 Argentinierstr. 8/4/1851
fax: (+431) 58801/18598 A-1040 Wien AUSTRIA EUROPE
Return to the
comp.compilers page.
Search the
comp.compilers archives again.