Related articles |
---|
optimizing for caches richard@meiko.com (Richard Cownie) (1992-11-17) |
Re: optimizing for caches moss@cs.cmu.edu (1992-11-19) |
Re: optimizing for caches preston@miranda.cs.rice.edu (1992-11-19) |
Re: optimizing for caches tchannon@black.demon.co.uk (1992-11-21) |
Re: optimizing for caches markt@harlqn.co.uk (1992-11-26) |
optimizing for caches richard@meiko.com (Richard Cownie) (1992-12-01) |
Newsgroups: | comp.compilers |
From: | markt@harlqn.co.uk (Mark Tillotson) |
Organization: | Harlequin Limited, Cambridge, England |
Date: | Thu, 26 Nov 1992 17:39:27 GMT |
References: | 92-11-098 92-11-117 |
Keywords: | architecture, performance |
Richard Cownie <richard@meiko.com> wrote:
> > So the relative cost of a cache miss has already risen from about 1.4
> > instructions to > 5 instructions, and the Viking clock speed is still only
> > 40MHz; the technology exists now to build processors running at 150MHz
> > (e.g. Alpha), which will take the cost of a cache miss over 20
> > instructions.
>
tchannon@black.demon.co.uk replied:
> Ignoring the problems of very large memory arrays and other secondary
> effects the access times of dy ram can with good design be rather faster
> than you suggest. This is what page mode and static column mode are about.
> Many accesses may be possible each with a 40ns or so cycle time and this
> is quite different from the classic mode where the cycle time would be
> 100..120ns.
However, these access times are only obtained once you have done an
initial memory transfer within the page or column concerned. When a
cache-miss occurs this clearly won't be the case, since the reference is
to something not in the cache (and hence cannot have just been accessed).
Column mode etc. are ideal for _filling_ a cache line, or feeding a video
stream, but the initial latency to access (and possibly translate) a
_random_ address is the problem for cache misses.
The memory bandwidth is also relevant, but its only a part of the story,
since cache protocols usually first access the address that is most
immediately required in the line.
Furthermore a 40ns cycle time is still _very slow_ compared to a cache hit
on a 150MHz machine, and until CPU and main memory are implemented on the
same chip, interconnect delays are also a fundamental problem.
Add to the discussion the hardware required to keep a set of caches onto
shared memory coherent, and the picture only gets worse...
--
Mark Tillotson Harlequin Ltd.
markt@uk.co.harlqn Barrington Hall,
+44 223 872522 Barrington, Cambridge CB2 5RG
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.