Re: Best Ref-counting algorithms?

=?ISO-8859-1?Q?Christoffer_Lern=F6?= <lerno@dragonascendant.com>
Sat, 18 Jul 2009 15:06:26 -0700 (PDT)

          From comp.compilers

Related articles
[25 earlier articles]
Re: Best Ref-counting algorithms? gah@ugcs.caltech.edu (glen herrmannsfeldt) (2009-07-17)
Re: Best Ref-counting algorithms? gneuner2@comcast.net (George Neuner) (2009-07-17)
Re: Best Ref-counting algorithms? gneuner2@comcast.net (George Neuner) (2009-07-17)
Re: Best Ref-counting algorithms? DrDiettrich1@aol.com (Hans-Peter Diettrich) (2009-07-18)
Re: Best Ref-counting algorithms? lerno@dragonascendant.com (=?ISO-8859-1?Q?Christoffer_Lern=F6?=) (2009-07-18)
Re: Best Ref-counting algorithms? gah@ugcs.caltech.edu (glen herrmannsfeldt) (2009-07-18)
Re: Best Ref-counting algorithms? lerno@dragonascendant.com (=?ISO-8859-1?Q?Christoffer_Lern=F6?=) (2009-07-18)
Re: Best Ref-counting algorithms? gneuner2@comcast.net (George Neuner) (2009-07-21)
Re: Best Ref-counting algorithms? lerno@dragonascendant.com (=?ISO-8859-1?Q?Christoffer_Lern=F6?=) (2009-07-22)
Re: Best Ref-counting algorithms? gneuner2@comcast.net (George Neuner) (2009-07-25)
Re: Best Ref-counting algorithms? lerno@dragonascendant.com (=?ISO-8859-1?Q?Christoffer_Lern=F6?=) (2009-07-27)
Re: Best Ref-counting algorithms? gneuner2@comcast.net (George Neuner) (2009-07-30)
Re: Best Ref-counting algorithms? lerno@dragonascendant.com (=?ISO-8859-1?Q?Christoffer_Lern=F6?=) (2009-08-02)
[2 later articles]
| List of all articles for this month |

From: =?ISO-8859-1?Q?Christoffer_Lern=F6?= <lerno@dragonascendant.com>
Newsgroups: comp.compilers
Date: Sat, 18 Jul 2009 15:06:26 -0700 (PDT)
Organization: Compilers Central
References: 09-07-018 09-07-039 09-07-043 09-07-054 09-07-060 09-07-066
Keywords: GC
Posted-Date: 19 Jul 2009 16:42:34 EDT

On Jul 18, 4:33 am, George Neuner <gneun...@comcast.net> wrote:
> On Fri, 17 Jul 2009 11:14:27 -0700 (PDT), Christoffer Lernv
> <le...@dragonascendant.com> wrote:
>> On Jul 16, 6:35 pm, George Neuner <gneun...@comcast.net> wrote:
>>> You can only safely restrict scanning if you can segregate objects
>>> that are shared (or could be). Most programming languages don't allow
>>> for that distinction up front, so the compiler or runtime environment
>>> has to handle it behind the scenes.
>
>> You are talking about restricting scanning for a tracing GC?
>
> No, I'm talking about segregating shared objects in a ref-counter GC
> so that only the shared object heap needs to be scanned for breaking
> cycles.


So the idea is to let the ref counting work for the non-shared and
then the shared set (which should be significantly smaller) is handled
by a tracing GC?


>>> If you're after very high performance (a hobby of mine due to a stint
>>> in real time programming), you should look into Baker's treadmill
>>> collector. Treadmills are expensive space-wise, but they have fixed
>>> processing costs and worst case behavior that is completely
>>> predictable (a critical requirement for RT).
>
>> I saw that Treadmills was supposedly covered in Lins book mentioned
>> earlier (which I ordered), I'll do some googling to find out more too.
>> But if you have any direct links to papers I'd be happy if you would
>> share them.
>
> Baker's classic paper is at http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.23.5878


Since this is a tracing GC, will not a pure version of a Treadmill
take longer to complete a round depending on how large the heap is?


I admit to still having a rather vague understanding of various
tracing GCs, but don't the vanilla tracing GCs necessarily work slower
the larger the heap is?


Especially if we have a large object tree that gets scanned repeadedly
but never needs to be collected.


My understanding is that the idea with generational GCs is to move
those structures to scan them much less often than majority of objects
that are very short-lived.


On the other hand, perhaps your suggestion is exactly to combine use 1
bit refcounts separate objects, and then slowly use a treadmill GC on
the shared objects.


Still, the fact that large, unchanging objects needs to be scanned is
a bit annoying as it increases costs the larger the heap is.


The cyclic tracer in Bacon's paper has the same problem as long as
references to these objects are added and removed:


A typical problem when this is an issue is there you have a Server
object that has references to most other parts of the system. This
server object creates connection objects that are added when new
connections are made, and removed when they disconnect.


If the connection objects keep a reference to the Server object, we'd
need to scan the whole server object and references - which might be
almost the entire heap! - every time we want to destroy a circular
reference to a connection.


>>> Every GC design has weak spots and most have worst case performance
>>> that is extremely bad (treadmills are an exception to bad worst case
>>> performance, but their weak spot is fixed size allocation blocks).
>
>>> The trick is to anticipate the use - which is all but impossible if
>>> the GC is intended to support general programming - and design for
>>> either the best average case or the worst possible case. Most GC
>>> systems today are hybrids designed for the best average case.
>
>> My problem with tracing GCs is that once the heap grows in excess of a
>> few hundred MB, most exhibit noticeable pauses. I want a GC that is
>> suitable for game programming, and I'd much rather have a steady lower
>> throughput that is predicable, than one that occasionally pauses but
>> with better average performance.
>
> That's not a failing of tracing GC, it's a failing of the particular
> implementation. There are lazy tracers that have near zero impact on
> the mutator - the downside is that they need a (helluva) lot of extra
> memory to make sure the mutator doesn't starve.


But that's also a problem. Obviously any GC would need to be able to
keep up with the garbage it needs to collect.


> Generational GC solves a lot of the pause complaints because most
> collection effort is concentrated on the nursery area which is
> typically 1 to a few megabytes or on the first generation which is
> typically 10 to 20 megabytes. Older generations are collected less
> and less frequently.


Isn't there the risk that unless you want a pause when the older
generation really IS collected, you'd need that the older generation
collector to not stop the world when it's run.


I mean, the problem with the IDE with the long pauses was basically
that it seemed to wait to GC until every 30 minutes or so, but
compensate by collecting for a much longer period when it finally
invoked the GC.


Although perhaps not that extreme, that's what happens in several of
the java GCs as well. And where things REALLY break down is when a GC
decides to do a trace when it has swapped into virtual memory. I've
seen processes freeze for over 30 minutes trying to do a full GC when
partially swapped into virtual memory.


>> For a game, you'd want 30 or 60 FPS, meaning that the total time
>> between frames are ~33 ms or ~17 ms respectively.
>> It's extremely more preferrable that the GC steals 5ms every frame
>> than 100ms every 600 frames.
>> (And that would be a rather well-behaving GC - I've experienced java
>> applications freezing tens of seconds on my new Macbook Pro when
>> collecting around 100 Mb out of 250Mb)
>
> Not that this will solve your problem, but the server version of the
> JVM has much better GC behavior.


I've noticed a bit of improvement, but I still have seen the
occasional hiccups of many hundred ms.


>> So anything that has a constant cost is much preferrable to one that
>> cause you to pay chunks of time for GC.
>
> Reference counting in general does _NOT_ have constant costs ... RC's
> costs depend greatly on whether linked structures are eagerly or
> lazily freed.


You're right of course. But the idea being approximately able to know
when you potentially might be releasing a large amount of objects is
very attractive if you are working on time-critical code.


Let's say you have code like this:


while(1)
{
      updateLogic();
      updateBuffer();
      sleep(timeToFrame());
      switchBuffer();
}


Then you want to be pretty damn sure that the GC won't kick in after
sleeping 5ms out of 6 and runs its 10ms GC.


Just suddenly braindumping here, but would it not be worthwhile to be
able to control garbage collection? In java you can sort of force a
GC, but that's pretty much it.


It would be nice to say "start automatic GC scheduling" or "wait with
GC" or "run GC for 10ms starting from now".


Any ideas why such controls are absent from java and several other
GC:ed languages?


/C


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.