Re: non-caching load and GC

"Steven A. Moyer" <sam2y@koa.cs.virginia.edu>
Fri, 2 Apr 1993 18:53:03 GMT

          From comp.compilers

Related articles
Re: non-caching load and GC sam2y@koa.cs.virginia.edu (Steven A. Moyer) (1993-04-02)
Utilization of Non-caching Access Instructions sam2y@server.cs.virginia.edu (Steven A. Moyer) (1993-04-06)
| List of all articles for this month |
Newsgroups: comp.object,comp.arch,comp.compilers
From: "Steven A. Moyer" <sam2y@koa.cs.virginia.edu>
Keywords: architecture, GC
Organization: University of Virginia Computer Science Department
References: <C4ppA5.BLx.1@cs.cmu.edu>
Date: Fri, 2 Apr 1993 18:53:03 GMT

monnier+@cs.cmu.edu (Stefan Monnier) writes:
>I have the impression that it could be interesting for a GC to use loads
>that don't necessarily bring the data in the cache (like the
>pipelined-float-load of the i860) in order not to completely flush the
>cache while garbage collecting (this seems most interesting when sweeping,
>but could be useful for copying collectors also).
>
>I haven't found any paper discussing this kind of 'optimisation'.
>
>Does such a paper exist ? Or is the idea just dumb ?
>Which processors have such a load ?


I'll address these questions in reverse order:


1) Currently the only microprocessor with such an instruction is the i860.


2) Utilizing a non-caching load instruction in a manner complementary to
      caching is in fact a good idea and can significantly improve effective
      memory bandwidth for many computations.


3) My very recently completed dissertation develops a family of compiler
      optimizations, called 'access ordering', that exploit a non-caching
      load instruction. These algorithms are applicable to stream-oriented
      computations (loosely, vectorizable) and are derived in the context of
      scientific computing. Essentially, one would like to apply blocking
      techniques to cache multiply-referenced data items and then use non-caching
      load instructions to reference single-visit items.


          Note: the technique is also very useful for implementing the
                      'copy optimization' by using non-caching loads to reference
                      items to be copied to the contiguous memory region.


      Applying non-caching loads in such a fashion requires much more than
      simply replacing load instructions that reference single-visit items
      with non-caching loads; one must then consider the effect of the
      observed reference sequence on the other side of the cache. Access
      ordering algorithms perform loop unrolling and reorder non-caching
      loads to exploit the underlying memory system characteristics (eg
      architecture and component type).


      The work presented in my dissertation focuses on reordering algorithms
      for a number of common architecture/component pairs; performance models
      are derived for the resulting sequences of non-caching accesses.
      Combining caching and non-caching accesses is discussed in a general
      way, but is not formalized (hey, ya got to draw the line somewhere :-)


      Papers on access ordering have not reached print yet, but if you are
      interested in obtaining further information you can get two pertinent
      techreports via anonymous ftp to uvacs.cs.virginia.edu; the reports
      are the compressed postscript files:


          pub/techreports/IPC-92-02.ps.Z (single module architecture)
          pub/techreports/IPC-92-12.ps.Z (interleaved architecture)


      Warning: these reports contain old (and not particularly well done, I'll
                        admit) notation; all the notation in the actual dissertation
                        has been significantly altered and matured and results have a
                        much greater degree of formality. Furthermore, by altering
                        notation many of the equations simplified significantly.


                        Therefore, if you find this optimization to be useful for your
                        work, it is recommended you contact me to receive a copy
                        of the complete (and *much* improved) dissertaton text.


I hope this helps in addressing your question. I'm afraid I'm not that
familiar with the state of the art in GC algorithms, but from your
description it sounds as if access ordering techniques may be helpful.


Steve
--
Steve Moyer
Computer Science Department
University of Virginia
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.