Related articles |
---|
[2 earlier articles] |
vectorization in icc aart.bik@intel.com (Bik, Aart) (2002-12-03) |
Re: vectorization in icc kfredrik@saippua.cs.Helsinki.FI (Kimmo Fredriksson) (2002-12-07) |
vectorization in icc aart.bik@intel.com (Bik, Aart) (2002-12-07) |
Re: vectorization in icc terryg@qwest.net (Terry Greyzck) (2002-12-11) |
Re: vectorization in icc kf@iki.fi (2002-12-11) |
Re: vectorization in icc kf@iki.fi (2002-12-11) |
Re: vectorization in icc kf@iki.fi (2002-12-11) |
Re: vectorization in icc nmm1@cus.cam.ac.uk (2002-12-13) |
From: | kf@iki.fi |
Newsgroups: | comp.compilers |
Date: | 11 Dec 2002 22:24:55 -0500 |
Organization: | - |
References: | 02-12-049 |
Keywords: | parallel, performance |
Posted-Date: | 11 Dec 2002 22:24:55 EST |
Hi again,
Things seem to change if you use larger values than 16 in the inner-loop
counter. Actually, the value is not hard coded, I just experimented with
16, because 128/8=16... In real world apps it can be anything. I tried
160, and the speed-up was about 7.2X, and about 2X over my hand coded
32bit vectorized code... So, things are looking better...
kf
Return to the
comp.compilers page.
Search the
comp.compilers archives again.