Re: Cache size restrictions obsolete for unrolling?

"Harold Aptroot" <harold.aptroot@gmail.com>
Fri, 9 Jan 2009 13:51:33 +0100

          From comp.compilers

Related articles
Cache size restrictions obsolete for unrolling? linuxkaffee_@_gmx.net (Stephan Ceram) (2009-01-07)
Re: Cache size restrictions obsolete for unrolling? harold.aptroot@gmail.com (Harold Aptroot) (2009-01-09)
Re: Cache size restrictions obsolete for unrolling? gneuner2@comcast.net (George Neuner) (2009-01-10)
Re: Cache size restrictions obsolete for unrolling? linuxkaffee_@_gmx.net (Stephan Ceram) (2009-01-10)
Re: Cache size restrictions obsolete for unrolling? jgd@cix.compulink.co.uk (2009-01-10)
Re: Cache size restrictions obsolete for unrolling? harold.aptroot@gmail.com (Harold Aptroot) (2009-01-10)
| List of all articles for this month |

From: "Harold Aptroot" <harold.aptroot@gmail.com>
Newsgroups: comp.compilers
Date: Fri, 9 Jan 2009 13:51:33 +0100
Organization: A noiseless patient Spider
References: 09-01-010
Keywords: architecture, performance
Posted-Date: 09 Jan 2009 08:35:43 EST

"Stephan Ceram" <linuxkaffee_@_gmx.net> wrote in message
> I've made the experience that for some DSPs it's better to unroll
> loops as much as possible without taking care of the instruction
> cache. ...
>
> My feeling is that modern processors have sophisticated features (like
> prefetching, fast memories ...) that heavily help to hide/avoid
> instruction cache misses, thus they rarely occur even if a frequently
> executed loop exceeds the cache capacity. In contract, aggressive
> unrolling reduced the expensive execution of branches (especially
> mispredicted) in the loop header and produced more optimization
> potential. In total, this pays off even at the cost of some more cache
> misses. So my first conclusion is that the commonly found restriction
> of unrolling factors to avoid too large loops not fitting in the cache
> is obsolete and does not hold for modern processors and compilers.


I have a strong feeling that it all depends very much on the platform
you're targeting. And maybe also on how much memory the loop itself
accesses as data, that wouldn't put any more pressure on the code
cache obviously, but if the code does not fit in the cache then the
real data and the code will both be fighting for the same resources
(secondary cache, main memory). I haven't tested it at all, but it
could matter, right?



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.