Re: gawk memory leak (Chris Clark USG)
7 Apr 1997 15:04:11 -0400

          From comp.compilers

Related articles
[3 earlier articles]
Re: gawk memory leak (stuart(yeates)) (1997-04-06)
Re: gawk memory leak (1997-04-06)
Re: gawk memory leak (Max Hailperin) (1997-04-06)
Re: gawk memory leak (1997-04-07)
Re: gawk memory leak (1997-04-07)
Re: gawk memory leak (1997-04-07)
Re: gawk memory leak (1997-04-07)
Re: gawk memory leak (1997-04-08)
Re: gawk memory leak (1997-04-11)
Re: gawk memory leak (1997-04-11)
Re: gawk memory leak (David Chase) (1997-04-11)
Re: gawk memory leak (Paul David Fox) (1997-04-13)
| List of all articles for this month |

From: (Chris Clark USG)
Newsgroups: comp.compilers
Date: 7 Apr 1997 15:04:11 -0400
Organization: Digital Equipment Corporation - Marlboro, MA
References: 97-03-165 97-04-020 97-04-022 97-04-040
Keywords: GC, performance, practice (Mike Albaugh) wrote:

> . . . Unless there is something major I'm missing, I'm afraid I'll
> continue to view garbage collection in the "Not _much_ Spam in it"
> category :-)

I think I understand this sentiment, as I write and maintain quite a
few programs which I wouldn't normally have thought of having that
much use for a garbage collector--and many of the ones I maintain are
written in low-level languages like C and Pascal. Moreover, most of
them have very simple memory logic, most variables are simple
(automatic or static), except for one big data structure which is
getting built, and then once it is built, it is used and discarded (in
its entirety). None of them have any memory leaks, the model is just
too simple.

However, two things that I have done in the last couple of years echo
more of the sentiment expressed next. (Max Hailperin) wrote:

> The best programmers of manual-deallocation systems typically go to
> great lengths to avoid having to do fully-general global reasoning
> about the question "am I the last". For example, they may copy
> structures that there is no logical reason to copy, so long as there
> is no necessity of sharing. For structures that need to be shared,
> they may designate an explicit "owner" who all the other users
> "check in" and "check out" with, effectively manually implementing
> reference counting. Etc. For simple situations it isn't too bad,
> but for hairy stuff this effort can easily overwhelm the effort
> actually devoted to the program's intended purpose.

The two examples didn't go to great lengths to avoid sharing. It was
simply that the programs went to no efforts to share the data.

The first example was the Chow optimizer. Like many bit-vector based
optimizers, it generates lots of bit strings (n per basic block, where
n is in the 5-20 range) representing its various dataflow equations.
Eventually, the C++ front-end through massive inlining manages to
generate a program to optimize with thousands of basic blocks and
thousands of variables. The resulting bit vectors took several
gigabytes, and would take hours of thrashing before overflowing the
disk set aside for swapping.

This fit my typical paradigm for a non-garbage collected program. The
bit vectors were generated once, and freed as soon as they weren't
needed. However, there were not attempts to share the bit vectors
between different basic blocks (and they were all mutable, often at
irratic points) nor between different bit meanings in the same block.

After doing some measurements, we decided that sharing the bit vectors
was the only hope. An experiment with only sharing the trivial (all
zero) bit vectors proved that it was a potentially viable solution, by
allowing the optimizer to get through a few more equations. Of
course, sharing the bit vectors (especially as they were still being
mutated) required implementing a garbage collecting strategy. Thus,
the program which had no garbage evolved into a program which
generated garbage (and collected it) and yet was more efficient.

The second example was in "Third Degree" (a Purify like tool for DEC
systems). Third Degree also does some dataflow analysis to help it
determine which loads and stores cannot reference invalid and/or
uninitialized memory so that it can avoid instrumenting them. In this
case, the analysis is in the form of symbolic evaluation (perhaps that
is the wrong word--what I mean is that it creates symbolic values
(i.e. unique integers) and propagates them through the data flows,
rather than doing bit manipulations).

Anyway, the original symbolic values were designed to fit in a single
long, with a couple of the bits being reserved for special case flags.
However, to handle the case where only some bits of the register were
uninitialized and some were known, we needed to add information to the
symbolic values such that it no longer fit in a long, and put it in a
heap allocated record, and at that point the memory leaks appeared.
The original code could cavalierly create temporary symbols whenever
it wanted, because being longs the next setting of their value would
destroy the previous temporary. Not so with heap allocated records.
Again we wrote a garbage collector, but this time to tell us where the
leaks were at which point we patched them at the source. The reason
we had to write a new collector to check this, was that objects were
allocated and deallocated in batches so that the memory wouldn't leak,
but of course that hid a subtler kind of leak inside.

So, in the end, I am indebted to garbage collectors. They definitely
made a couple of difficult tasks possible. However, I still like to
be picky about their application. In particular, I like being able to
say, hmmm, this looks amenable to garbage collection, apply yourself
to it. I also like being able to get feedback from my garbage
collector, so that I can fix my other bugs in my algorithms.

-Chris Clark

Hmmm--a whole article without any parsing in it. I even left out the
analogy of LALR state merging with sharing to keep it that way. Oops, I
guess I just spoiled it.

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.