Re: What do I need to know to write a scheme compiler?

George Neuner <gneuner2@comcast.net>
Mon, 02 Feb 2015 00:31:39 -0500

          From comp.compilers

Related articles
What do I need to know to write a scheme compiler? orchid.hybrid@gmail.com (orchidaceae phalaenopsis) (2015-01-29)
Re: What do I need to know to write a scheme compiler? gneuner2@comcast.net (George Neuner) (2015-02-02)
Re: What do I need to know to write a scheme compiler? gneuner2@comcast.net (George Neuner) (2015-02-05)
| List of all articles for this month |

From: George Neuner <gneuner2@comcast.net>
Newsgroups: comp.compilers
Date: Mon, 02 Feb 2015 00:31:39 -0500
Organization: A noiseless patient Spider
References: 15-02-001
Keywords: Scheme
Posted-Date: 02 Feb 2015 15:38:50 EST

On Thu, 29 Jan 2015 17:24:39 +0000, orchidaceae phalaenopsis
<orchid.hybrid@gmail.com> wrote:


>Hello comp.compilers,
>
>My goal is to write a self hosting compiler for scheme to x86 64
>assembly. I will be happy to leave call/cc out of it since I think it
>complicates things. What I am focusing on is lambda and tail call
>elimination.
>
>I've tried to do it using the CPS transform but this produced code
>that was far too slow to self host. I think that it introduces too
>many lambdas all of which get heap allocated, I don't know how to do
>analysis on the code to find out how which closures can be stack
>allocated.


Most of Scheme is not terribly different from compiling Pascal. The
main differences are the need to create heap closures where necessary
and to maintain a mutable namespace environment at the top level so
that set! can rebind top-level definitions.




>Could anyone give me advice on what I need to learn about implementing
>closures in order to make a compiler that's efficient enough to self
>host rather than just compiling tiny test programs? Should I stick
>with CPS and try to do optimizations based on 0CFA (I have read about
>it and it seems powerful but it looks very very difficult).


CPS isn't really needed at all, and it doesn't always mix well with
some other useful conversions: e.g., SSA or value numbering.


Of course, SSA and value numbering make optimizing mutual recursions
expensive, so if you want to do that [very few Schemes bother] you do
have to start with CPS. SSA and value numbering have no problem with
tail calls or simple tail recursion.




WRT optimizing closures, you should investigate "lambda lifting" and
"closure conversion". Lambda lifting is a way of eliminating some
nested functions. Closure conversion is a way to eliminate heap
allocation for closures that can be stack allocated instead - it
(variously) turns them into stack objects and/or extra call chain
arguments.


Heap allocated closures are simple: just lay out the environment
exactly as you would a stack frame, but then allocate it on the heap
at the point of definition instead of at the point of use. Every
function in Scheme has a definition environment [the top-level
environment if nothing else] so every "function pointer" can be a pair
of pointers - code and environment - which are passed around together
[i.e. "fat" pointers].


call/cc requires that you copy the current call stack and preserve it
for reentry. Depending on how you implement the stack it can be
expensive and tricky. Many Schemes eschew the traditional C-like
array and instead use a linked list of heap allocated frames. The
linked list stack makes call/cc (relatively) cheap to implement, but
is less performant in the typical case where call/cc is not used and
it requires GC to reclaim stack space.


It is possible use an array stack and to save/restore and switch
stacks as necessary, but that is tricky to do in native code ... it's
a lot easier to do if you implement a virtual machine. The VM code
does not have to be interpreted - e.g., you can use threaded code -
but you need the flexibility to switch stacks at will.
[I find that dual stacks - separating control and data - are
especially convenient for implementing languages having nested
functions. However, dual stacks make interfacing to native libraries
less performant if you need to do that.]


If you forget about call/cc then you can use whichever type of stack
is most convenient. call/ec and throw/catch will work either way.
However, without call/cc you can't call your language a Scheme but
rather only Scheme-like.


Hope this helps,
George


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.