Compiling Lexical Closures to C

grom358@gmail.com
Tue, 25 Sep 2012 16:57:01 -0700 (PDT)

          From comp.compilers

Related articles
Compiling Lexical Closures to C grom358@gmail.com (2012-09-25)
Re: Compiling Lexical Closures to C cr88192@hotmail.com (BGB) (2012-09-27)
| List of all articles for this month |
From: grom358@gmail.com
Newsgroups: comp.compilers
Date: Tue, 25 Sep 2012 16:57:01 -0700 (PDT)
Organization: Compilers Central
Keywords: code, question
Posted-Date: 26 Sep 2012 11:19:38 EDT

Working on a basic compiler for a language that supports nested functions and
full closures.


I have read a few articles like
http://matt.might.net/articles/compiling-scheme-to-c/ where they using a
structure for the closure,


struct closure {
    Lambda lam;
    void *env;
};


So say I have the following:
var createSpinner = function(initVal) {
    var x = initVal;
    return [
        function() { return ++x; },
        function() { return --x; }
    ];
};


The return functions can share the environment so they both modifying the same
x variable. So can hoist the x variable into a structure on the heap and pass
pointer to it to both the increment and decrement functions returned in the
array.


// Example use in language
var spinner = createSpinner(10);
spinner[0](); // 11
spinner[0](); // 12
spinner[1](); // 11


// Pseudo compiled C
Array spinner = createSpinner(10);
spinner[0]->lam(spinner[0]->env);
spinner[0]->lam(spinner[0]->env);
spinner[1]->lam(spinner[1]->env);


Now looking at more complex example:
var createLines = function() {
    var m = 2;
    var c = 3;
    return {
        'line': function(x) { return m * x + c; },
        'lineSquare': function(x) { return x * x + c; }
    };
};


Now how to pass the free variables? I could put all free variables into an
environment. Which means memory can be wasted, because as long as a closure is
still referenced then all the free variables by the top level function are
kept in memory.


Additionally how to deal with deeper nested functions, for example:
var run = function() {
    var message = '';
    var setMessage = function(msg) {
        message = msg;
    };
    var sayHello = function() {
        var getName = function() {
                println(message ~ ', what is your name?');
                return readLine();
        };
        var name = getName();
        println(message ~ ' ' ~ name);
    };
    setMessage('hello');
    sayHello();
};


So the getName function needs reference to the message variable declared in
the run function. One solution I was thinking would be to have chained
environment for each level of nesting. Eg. getName would reference the message
variable via env->parentEnv->message. That is for each level of nesting need
to have another pointer dereference to access the free variable.


Alternatively the other solution I was thinking was free variables are
indirectly related to the environment, and each closure has its own
environment. Free variables are shared between environments by them having
reference to the same free variable. Eg.


struct env_getName {
    char *message[];
};


So each free variable is allocated separately on the heap and each free
variable is reference counted (or garbage collected) separately. Therefore
disadvantage here is have more allocations of memory. One per free variable
then two for each closure (one for the closure, and one for its environment;
though could collapse the environment into the closure structure).


Could also use hybrid approach. The compiler using different solutions based
on how much sharing there is of free variables. But this means the compiler is
getting more complex. For example, in the simple case of:


var makeCounter = function(x) {
    return function() { return ++x; };
};
var counter = makeCounter(10);
counter(); // returns 11


Could compile into:
int func_a1(int *x) {
    return ++(*x);
}


struct closure_makeCounter {
    int (*func_a1)();
    int x;
};


struct closure_makeCounter counter = /* allocate closure */;
counter.func_a1(&counter.x);


Then only have the one allocation in this simple case, and only have to
maintain the reference counts on the closure.


What other techniques/methods are there for compiling lexical closures
into C?



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.