|Re: non-caching load and GC firstname.lastname@example.org (Steven A. Moyer) (1993-04-02)|
|Utilization of Non-caching Access Instructions email@example.com (Steven A. Moyer) (1993-04-06)|
|From:||"Steven A. Moyer" <firstname.lastname@example.org>|
|Organization:||University of Virginia Computer Science Department|
|Date:||Fri, 2 Apr 1993 18:53:03 GMT|
email@example.com (Stefan Monnier) writes:
>I have the impression that it could be interesting for a GC to use loads
>that don't necessarily bring the data in the cache (like the
>pipelined-float-load of the i860) in order not to completely flush the
>cache while garbage collecting (this seems most interesting when sweeping,
>but could be useful for copying collectors also).
>I haven't found any paper discussing this kind of 'optimisation'.
>Does such a paper exist ? Or is the idea just dumb ?
>Which processors have such a load ?
I'll address these questions in reverse order:
1) Currently the only microprocessor with such an instruction is the i860.
2) Utilizing a non-caching load instruction in a manner complementary to
caching is in fact a good idea and can significantly improve effective
memory bandwidth for many computations.
3) My very recently completed dissertation develops a family of compiler
optimizations, called 'access ordering', that exploit a non-caching
load instruction. These algorithms are applicable to stream-oriented
computations (loosely, vectorizable) and are derived in the context of
scientific computing. Essentially, one would like to apply blocking
techniques to cache multiply-referenced data items and then use non-caching
load instructions to reference single-visit items.
Note: the technique is also very useful for implementing the
'copy optimization' by using non-caching loads to reference
items to be copied to the contiguous memory region.
Applying non-caching loads in such a fashion requires much more than
simply replacing load instructions that reference single-visit items
with non-caching loads; one must then consider the effect of the
observed reference sequence on the other side of the cache. Access
ordering algorithms perform loop unrolling and reorder non-caching
loads to exploit the underlying memory system characteristics (eg
architecture and component type).
The work presented in my dissertation focuses on reordering algorithms
for a number of common architecture/component pairs; performance models
are derived for the resulting sequences of non-caching accesses.
Combining caching and non-caching accesses is discussed in a general
way, but is not formalized (hey, ya got to draw the line somewhere :-)
Papers on access ordering have not reached print yet, but if you are
interested in obtaining further information you can get two pertinent
techreports via anonymous ftp to uvacs.cs.virginia.edu; the reports
are the compressed postscript files:
pub/techreports/IPC-92-02.ps.Z (single module architecture)
pub/techreports/IPC-92-12.ps.Z (interleaved architecture)
Warning: these reports contain old (and not particularly well done, I'll
admit) notation; all the notation in the actual dissertation
has been significantly altered and matured and results have a
much greater degree of formality. Furthermore, by altering
notation many of the equations simplified significantly.
Therefore, if you find this optimization to be useful for your
work, it is recommended you contact me to receive a copy
of the complete (and *much* improved) dissertaton text.
I hope this helps in addressing your question. I'm afraid I'm not that
familiar with the state of the art in GC algorithms, but from your
description it sounds as if access ordering techniques may be helpful.
Computer Science Department
University of Virginia
Return to the
Search the comp.compilers archives again.