Re: Compiler writers will love this language

Joachim Durchholz <joachim.durchholz@web.de>
2 Jul 2003 00:36:52 -0400

          From comp.compilers

Related articles
[12 earlier articles]
Re: Compiler writers will love this language torbenm@diku.dk (2003-06-20)
Re: Compiler writers will love this language d00-mla@nada.kth.se (Mikael 'Zayenz' Lagerkvist) (2003-06-22)
Re: Compiler writers will love this language mwotton@cse.unsw.edu.au (2003-06-22)
Re: Compiler writers will love this language dot@dotat.at (Tony Finch) (2003-06-25)
Re: Compiler writers will love this language lex@cc.gatech.edu (Lex Spoon) (2003-06-25)
Re: Compiler writers will love this language firefly@diku.dk (Peter \Firefly\Lund) (2003-06-25)
Re: Compiler writers will love this language joachim.durchholz@web.de (Joachim Durchholz) (2003-07-02)
Re: Compiler writers will love this language strohm@airmail.net (John R. Strohm) (2003-07-02)
Re: Compiler writers will love this language genew@mail.ocis.net (2003-07-02)
Re: Compiler writers will love this language ericmuttta@email.com (2003-07-02)
Re: Compiler writers will love this language mwotton@cse.unsw.edu.au (2003-07-03)
Re: Compiler writers will love this language nmm1@cus.cam.ac.uk (2003-07-04)
Re: Compiler writers will love this language nmm1@cus.cam.ac.uk (2003-07-04)
[7 later articles]
| List of all articles for this month |

From: Joachim Durchholz <joachim.durchholz@web.de>
Newsgroups: comp.compilers
Date: 2 Jul 2003 00:36:52 -0400
Organization: Compilers Central
References: 03-05-211 03-06-015 03-06-054 03-06-057 03-06-078
Keywords: storage, design
Posted-Date: 02 Jul 2003 00:36:52 EDT

Eric wrote:
> mwotton@cse.unsw.edu.au (Mark Alexander Wotton) wrote
>
>>On 5 Jun 2003 23:22:01 -0400, Eric posted:
>>
>>Either that, or only released when all references to it have passed
>>out of scope. This is how many modern garbage-collected language
>>implementations work.
>
> This scheme (reference-counting) is used in VB6 for instance.


No, garbage-collected languages don't use reference counting.


  > VB7 (and
> every other .NET language?)


AFAIK, .net uses automatic garbage collection.


  > uses generational, mark-and-sweep GC (hope
> that's the right term for it).


Correct but somewhat overspecific: there are non-generational
mark-and-sweep collectors.


  > When I read its description, it sounded
> like a rather elaborate scheme and one that would need a good deal of
> implementation effort.


That's true. Writing a good garbage collector takes several
person-months or person-years.


  > I prefer plain vanilla automatic
> reference-counting. Its deterministic (hence useful in real-time
> systems),


The counterargument runs as follows:
Garbage collections becomes important exactly at the point when it's
becoming difficult to statically determine which parts of some data
structure become inaccessible.
In other words, there will be some code that manipulates pointers, where
static analysis won't reveal whether the block of memory pointed to
before the change just lost its last reference.
The problem is that the time needed to remove the link to that block
becomes statically unpredictable: it will either be the relatively cheap
decrement of a reference count, or it will be the much more
time-consuming process of adding the block to the list of free blocks,
potentially merging it with neighbouring free blocks, and updating lots
of background data structures that do heap management. Even worse:
freeing the block means decrementing the reference counts of all blocks
that the block-being-released referred to, creating an avalanche of
block releases that may affect all blocks where static analysis doesn't
determine that they stay accessible. That's not even viable for soft
real-time, let alone hard real-time!


There are techniques around this. For example, one can delay the actual
freeing of a block whenever there is time. This trades real-time latency
for some bookkeeping and noticeably more memory consumption (since
inaccessible data is not being freed at the earliest opportunity).


The next interesting point on the spectrum is already incremental
garbage collection. It's just like stop-the-world mark-and-sweep
collection, it just interleaves the execution of the garbage collection
and normal program execution. Since garbage collection is a separate
process, it can be deferred to "quiet times" (i.e. when not servicing
time-critical requests) just like delayed-freeing reference counting.


Reference counting is optimal for predicting memory usage (since no data
will be kept beyond necessity), but bad for predicting latency times.
Garbage collection is not very good for predicting memory usage, but
excellent for predicting latency times: assigning to a pointer is just
the usual write-to-memory, and garbage collection is just a background
process.


  > simple to explain and understand, and a lot easier to
> implement and get right.


That's very correct. If you don't have a working garbage collector and
don't have the budget for writing one, use reference counting.
However, most of the time you have the Boehm-Demers-Weiser collector.
It's written in C and will run on practically any hardware that can run C.


> I admit, reference-counting has its problems


Yes, there are two serious problems here:
1) Reference counting cannot handle cycles. All solutions that I know
reintroduce mark-and-sweep at some scale, and I doubt it's even possible
to avoid it.
2) Reference counting is horribly expensive. Every single assignment to
a pointer incurs two memory writes: one to decrement the reference count
of the block that it previously referred to, and one to increment the
reference count it's going to refer to after the change.
If the hardware architecture has an on-CPU data cache, things are even
worse: you dirty two memory locations that are most likely not in the
cache line. Pointer writes require /triple/ the required memory
bandwidth compared to simply assigning to the reference in memory! In an
SMP environment, this also increases the chance that two CPUs must
engage in (time-consuming) cache coherency protocols when they wish to
modify data in the same cache line.
Finally, CPUs are favoring read accesses over write accesses. Adding a
write access is always more expensive than adding a read access.


There are ways to reduce the efficiency impact of reference counting,
but the net result of optimizing reference counting will give something
just as complicated as a good incremental garbage collector...


  > and I have been toying
> around with an idea for what I call "compile-time reference-counting".
> Basically, I am looking for a set of rules, that will allow the
> compiler to figure out when to release memory for an object, *without*
> having to allocate memory space for a reference counter value.


There is research in that area available. Keywords to look for are
"liveness analysis", and/or "regional (garbage collection|liveness
analysis)". You'll also find a lot of people who are much more
knowledgeable than myself in news:comp.lang.functional.


Hope this helps,
Jo


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.