Re: GC question

Ulf Wiger <etxuwig@cbe.ericsson.se>
Fri, 27 Jul 2007 16:30:53 +0200

From comp.compilers

Related articles
GC question phr-2007@nightsong.com (Paul Rubin) (2007-07-27)
*Re: GC question etxuwig@cbe.ericsson.se (Ulf Wiger)* (2007-07-27)**
Re: GC question gene.ressler@gmail.com (Gene) (2007-07-28)
Re: GC question bear@sonic.net (Ray Dillinger) (2007-07-28)
Re: GC question @iecc.com <phr-2007@nightsong.com (Paul@iecc.com, Rubin) (2007-07-30)
Re: GC question torbenm@app-2.diku.dk (2007-08-13)

| List of all articles for this month |

From:	Ulf Wiger <etxuwig@cbe.ericsson.se>
Newsgroups:	comp.compilers,comp.lang.functional
Date:	Fri, 27 Jul 2007 16:30:53 +0200
Organization:	Ericsson AB
References:	07-07-098
Keywords:	GC
Posted-Date:	27 Jul 2007 11:27:21 EDT

>>>>> "PR" == Paul Rubin <phr-2007@nightsong.com> writes:

    PR> Suppose you build a big list of cons cells, say a billion of
    PR> them (you're on a large machine). This is in a runtime with
    PR> traditional marking or copying gc, no generation scavenging or
    PR> region inference or anything like that. The collector runs
    PR> every N bytes of allocation for some fixed N. Yes I know that's
    PR> a dumb way to write an allocator for a big-system implementation
    PR> but it's fairly common for small ones.

    PR> It seems to me that the running time to allocate N cells is
    PR> O(N**2) because you run the collector O(N) times during the
    PR> allocation, and each collection costs O(N) on average.

    PR> I never realized this before. Is it a well-known phenemonon?

Where I come from (industrial use of Erlang), we have design rules
warning about this very thing.

    PR> Is the main answer something like generation scavenging?

<disclaimer>I have only superficial knowledge of GC techniques</..>

Not necessarily. It can actually make things worse.
If you, say, enter a loop that constructs a list one element
at a time, the garbage collector will have no way to tell
when you're going to stop. It will run out of heap, scan,
tenure data, etc., but as the heap just keeps growing, it's
not going to be able to reclaim much, if anything.

There may be different ways of going generational GC, but at
least in Erlang, the collector will keep an old heap and a new
heap, and revert to fullsweep GC if the generational scan fails
to reclaim enough data. If the fullsweep fails, it will resize
the heap. With constantly growing heap, this is likely to occur
at every GC stop. Extremely annoying.

A workaround specific to Erlang is that you can spawn a process
with a predefined minimum heap size - preferrably large enough
that GC will never kick in. It can build the data, then send it
in a big whopping message to the process that needs it. It
will be costly to copy the message, but not nearly as much as
paying for all those GC stops (which also copy).

Most of the time, you try to come up with an implementation that
doesn't build huge data structures. Again in Erlang, this is
usually achieved by exploiting the concurrency patterns in the
problem and dispatching data in small chunks to other processes.

Some seasoned GC expert will perhaps give you a good solution
to the problem. I can only tell you that you're not alone. (:

BR,
Ulf W
--
Ulf Wiger, Senior Specialist,
      / / / Architecture & Design of Carrier-Class Software
    / / / Team Leader, Software Characteristics
  / / / Ericsson AB, IMS Gateways

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: GC question

Ulf Wiger <etxuwig@cbe.ericsson.se>Fri, 27 Jul 2007 16:30:53 +0200

Ulf Wiger <etxuwig@cbe.ericsson.se>
Fri, 27 Jul 2007 16:30:53 +0200