vectorization in icc

kf@iki.fi
26 Nov 2002 22:16:29 -0500

          From comp.compilers

Related articles
vectorization in icc kf@iki.fi (2002-11-26)
Re: vectorization in icc skral@mips.complang.tuwien.ac.at (Kral Stefan) (2002-12-01)
vectorization in icc aart.bik@intel.com (Bik, Aart) (2002-12-03)
Re: vectorization in icc kfredrik@saippua.cs.Helsinki.FI (Kimmo Fredriksson) (2002-12-07)
vectorization in icc aart.bik@intel.com (Bik, Aart) (2002-12-07)
Re: vectorization in icc terryg@qwest.net (Terry Greyzck) (2002-12-11)
Re: vectorization in icc kf@iki.fi (2002-12-11)
[3 later articles]
| List of all articles for this month |

From: kf@iki.fi
Newsgroups: comp.compilers
Date: 26 Nov 2002 22:16:29 -0500
Organization: -
Keywords: C, optimize, question
Posted-Date: 26 Nov 2002 22:16:29 EST

Hi,


I've been experimenting with the Intel C/C++ compiler for Linux, and in
particular, with the automatic vectorization.


I have the following piece of code (all the arrays are of type char):


for( j = 0; j < 16; j++ ) {


                d[ j ] = d[ j ] + d[ j ];
                d[ j ] = d[ j ] | B[ j ];
                dm[ j ] = d[ j ] & mm[ j ];
}




which compiles to the following, if I disable the vectorization:


..B3.16: # Preds ..B3.16 ..B3.15
                movb 4656(%esp,%ecx), %al #178.23
                movb 4672(%esp,%ecx), %dl #181.23
                addb %al, %al #178.23
                orb (%edi,%ecx), %al #179.23
                movb %al, 4656(%esp,%ecx) #179.4
                andb %dl, %al #181.23
                movb %al, 4688(%esp,%ecx) #181.4
                addl $1, %ecx #176.23
                cmpl $16, %ecx #176.3
                jl ..B3.16 # Prob 93% #176.3




With vectorization enabled, I get the following, i.e. the loop is
eliminated by using sse2 instructions:


                paddb %xmm1, %xmm1 #177.14
                por 80(%esp,%ecx,8), %xmm1 #178.14
                movdqa %xmm1, %xmm3 #180.14
                lea 1(%edi), %eax #183.44
                addl $1, %esi #169.21
                pand %xmm0, %xmm3 #180.14
                movdqa %xmm3, 4720(%esp) #180.4


Both work just fine, but the vectorized code is significantly slower!
I certainly expected the vectorized code to be much faster.


What's going on? Are the sse2 instructions really so slow compared to
the standard integer instructions? If so, what's the point of the
vectorization anyways?


Thanks.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.