Re: Ada GC and a bunch of other stuff

chase@centerline.com (David Chase)
9 Feb 1996 14:26:13 -0500

          From comp.compilers

Related articles
Re: Ada GC and a bunch of other stuff hbaker@netcom.com (1996-02-03)
Re: Ada GC and a bunch of other stuff chase@centerline.com (1996-02-09)
Re: Ada GC and a bunch of other stuff ncohen@watson.ibm.com (1996-02-09)
Re: Ada GC and a bunch of other stuff dave@occl-cam.demon.co.uk (Dave Lloyd) (1996-02-13)
Re: Ada GC and a bunch of other stuff hosking@cs.purdue.edu (1996-02-13)
Re: Ada GC and a bunch of other stuff boehm@parc.xerox.com (1996-02-14)
Re: Ada GC and a bunch of other stuff dave@occl-cam.demon.co.uk (Dave Lloyd) (1996-02-16)
Re: Ada GC and a bunch of other stuff hbaker@netcom.com (1996-02-17)
[4 later articles]
| List of all articles for this month |

From: chase@centerline.com (David Chase)
Newsgroups: comp.compilers
Date: 9 Feb 1996 14:26:13 -0500
Organization: CenterLine Software
References: 96-02-031
Keywords: Ada, GC

hbaker@netcom.com (Henry Baker) writes:
> Someone mentioned the issue of the interaction of GC with compiler
> optimizations. David Chase has probably studied this more (at least
> in the last several years) than most everyone else, and has published
> papers on this (ACM PLDI, 1988??).


Thanks for the plug, but you should also take note of the work done at
U Mass by Eliot Moss, Rick Hudson, Amer Diwan, and probably others
whose names I don't know offhand. Their goal is a Modula-3 system
that does both the "usual optimizations" and precise pointer-tracking
as part of a persistent object system (I'm paraphrasing -- I'm most
familiar with the subgoal of pointer tracking in a compiler-optimized
system, and not so familiar with their main goal). It's not a pretty
problem.


Hans Boehm has also collected a few good examples of how things can go
wrong -- we've been discussing the problem for years now, and trying
to come up with "cheap" ways of curing the symptoms and/or the
disease, or convincing ourselves that it isn't worth worrying about in
practice.


The PLDI '88 paper says little about interaction between optimization
and garbage collection. All that discussion remains, unfortunately,
still a chapter in my dissertation (perhaps the only chapter still
worth reading).


And, for the sake of completeness, here are a few of the examples.
This is basically a collection of disasters that would happen if you
took a (really good) C/Fortran optimizer and combined it with an
off-the-shelf garbage collector, and had a very unlucky day. In
practice, current compilers and the conservative collectors have never
been observed to trigger these bugs "in the field", though I managed
to make it happen in 1988 "in the lab". There's a sort of
mix-and-match flavor to them -- for instance, dealing with GC and
optimization in a single-threaded non-compacting system is much saner
than it is in a multi-threaded, compacting system, and it is worse yet
in a concurrently compacting system.


1. Basic pointer arithmetic optimizations. Any decent optimizer is
likely to take expressions like A[i+j] and form some sort of
addressing subexpression involving the address of the array A and i,
j, or both. The resulting subexpression could point *anywhere*, since
there are no constraints (generally) on the values in i and j,
considered separately (the sum is constrained, of course).


More creative/aggressive optimizers might form the *difference*
between two pointers. This is "really awful" from the POV of a
garbage collector.


2. Initialization. If the collector is "precise", meaning that it is
designed with the assumption that it will find all pointers, then you
have to be careful with newly allocated memory/variables. If it
contains junk, this could cause problems for the collector. Note that
an optimizer might notice that the "initialization" of a variable is
later overwritten, and remove the initialization -- after all, it
isn't used, is it?


3. Creative code generation. (A) On modern machines, it is often
faster to use the floating point register file if you have to move
lots of data from point A to B. Don't forget to look for pointers in
the floating pointer registers. (B) On some machines (PowerPC, e.g.)
the instruction set was designed so that offsets in a particular range
are formed by first adding a big number, then subtracting a small
number, as in ((P+65536)-1000). That first intermediate expression
might lie outside the bounds of the object referenced by P. (Hans
Boehm found this one) This also complicates life for table-based
schemes (briefly alluded to in my dissertation, actually figured out
and put to proper use by the gang at U Mass), since it means that the
code generator must be clued in to the table generation games (or
else, it should use different, perhaps less efficient, idioms).


4. Undoing the programmer's good intentions. Suppose that you,
Mr./Ms. Programmer, notices that you have a reference in a
non-longer-useful variable, as you are about to make a truly recursive
call, as in:


    f(p) {
        q = <something allocating memory>
        ... q ...
        f( r );
        <no further occurrences of q>
        }


This could lead to unnecessarily retained storage. But, we know what
to do here:


    f(p) {
        q = <something allocating memory>
        ... q ...
        q := NIL; /* Stomp on last reference to that memory, so
                                  it will be collected. */
        f( r );
        <no further occurrences of q>
        }


Too bad for you that the dead code eliminator got rid of that inserted
assignment to q, isn't it? (I hate it when that happens.)


5. Something really vile involving different values assigned to the
same variable on converging flow paths (this was related to me by
Eliot Moss) -- this has a bad interaction with precise pointer tracking.


Preston Briggs and I had a wonderful idea over beer a few years ago
that is too small to fit into the margin here, and that neither of us
has had the time to properly follow up.


On the other hand, I think it is really wonderful to see that people
are finally talking seriously about GC in programming languages. My
gut feeling is that most of the perceived "problems" with GC in
various contexts can be solved through the addition of More Memory
(for instance, treadmill-style collection requires a few header words
that mark-and-sweep does not).


speaking for myself,


David Chase
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.