Related articles |
---|
vectorization in icc kf@iki.fi (2002-11-26) |
Re: vectorization in icc skral@mips.complang.tuwien.ac.at (Kral Stefan) (2002-12-01) |
vectorization in icc aart.bik@intel.com (Bik, Aart) (2002-12-03) |
Re: vectorization in icc kfredrik@saippua.cs.Helsinki.FI (Kimmo Fredriksson) (2002-12-07) |
vectorization in icc aart.bik@intel.com (Bik, Aart) (2002-12-07) |
Re: vectorization in icc terryg@qwest.net (Terry Greyzck) (2002-12-11) |
Re: vectorization in icc kf@iki.fi (2002-12-11) |
Re: vectorization in icc kf@iki.fi (2002-12-11) |
Re: vectorization in icc kf@iki.fi (2002-12-11) |
Re: vectorization in icc nmm1@cus.cam.ac.uk (2002-12-13) |
From: | "Terry Greyzck" <terryg@qwest.net> |
Newsgroups: | comp.compilers |
Date: | 11 Dec 2002 22:18:40 -0500 |
Organization: | Compilers Central |
References: | 02-12-049 |
Keywords: | parallel |
Posted-Date: | 11 Dec 2002 22:18:40 EST |
"Bik, Aart" <aart.bik@intel.com> wrote:
>This problem, however, can be easily avoided in your application by adhering
>to one of the golden rules of effective SIMD vectorization: use the smallest
>possible data type. In the counting loop, an int data type is mixed with a
>char data type which is not very amendable to vectorization (which is probably
>why you used the #pragma novector). A simple inspection of this loop shows
>that the local counter in one complete loop execution can never exceed 16.
>Hence, a char counter (which matches the data type of the dm array nicely) can
>be used during the loop, after which the result is added back into the full
>int counter. This results in a vectorizable loop, as shown below.
This applies to the Intel ISA. For most vector architectures, such as
Cray and NEC, 'int' vectorizes perfectly well, and you want to avoid
using arrays of char (or char counters) - they will incur unacceptable
performance penalties as type char vectorizes less well than type int
or long.
If you want to maintain portability, do not use the smallest possible
data type. Use whatever data type you would if writing C code
normally - and using int as a counter is perfectly acceptable (and
extremely fast on most vector ISAs). Vectorizable code - at least
with Cray and NEC - is easy to write and generally does not require
adjustments to the data type, and there are no real concerns about
mixing types of different sizes.
For good vectorizable code:
- Use loops with a trip count that can be determined at compile or run time;
no while loops or linked lists.
- For C, use the 'restrict' qualifier where possible.
- If necessary, use the 'ivdep' pragma.
- Otherwise, write the loop like you would normally. Vectorizing compilers
have 25+ years of experience behind them and can handle most any
construct you can write.
For performance reasons - such as in your example running on an Intel
processor - you may want to maintain several versions of the loop,
using appropriate #ifdef preprocessor directives. One version would
be the 'normal' version, and another would be the version modified for
effective Intel (or Cray or NEC) vectorization. Maintaining the
original version of the loop helps others to understand the code in
the long term.
Terry Greyzck
terryg@qwest.net
http://www.greyzck.com
Return to the
comp.compilers page.
Search the
comp.compilers archives again.