| Related articles |
|---|
| [2 earlier articles] |
| vectorization in icc aart.bik@intel.com (Bik, Aart) (2002-12-03) |
| Re: vectorization in icc kfredrik@saippua.cs.Helsinki.FI (Kimmo Fredriksson) (2002-12-07) |
| vectorization in icc aart.bik@intel.com (Bik, Aart) (2002-12-07) |
| Re: vectorization in icc terryg@qwest.net (Terry Greyzck) (2002-12-11) |
| Re: vectorization in icc kf@iki.fi (2002-12-11) |
| Re: vectorization in icc kf@iki.fi (2002-12-11) |
| Re: vectorization in icc kf@iki.fi (2002-12-11) |
| Re: vectorization in icc nmm1@cus.cam.ac.uk (2002-12-13) |
| From: | kf@iki.fi |
| Newsgroups: | comp.compilers |
| Date: | 11 Dec 2002 22:24:55 -0500 |
| Organization: | - |
| References: | 02-12-049 |
| Keywords: | parallel, performance |
| Posted-Date: | 11 Dec 2002 22:24:55 EST |
Hi again,
Things seem to change if you use larger values than 16 in the inner-loop
counter. Actually, the value is not hard coded, I just experimented with
16, because 128/8=16... In real world apps it can be anything. I tried
160, and the speed-up was about 7.2X, and about 2X over my hand coded
32bit vectorized code... So, things are looking better...
kf
Return to the
comp.compilers page.
Search the
comp.compilers archives again.