|Incrementally implementing a simple ML compiler using LLVM firstname.lastname@example.org (Jon Harrop) (2007-11-26)|
|Re: Incrementally implementing a simple ML compiler using LLVM email@example.com (2007-11-27)|
|Re: Incrementally implementing a simple ML compiler using LLVM firstname.lastname@example.org (George Neuner) (2007-11-28)|
|Re: Incrementally implementing a simple ML compiler using LLVM email@example.com (=?ISO-8859-1?Q?Pertti_Kellom=E4ki?=) (2007-11-29)|
|Re: Incrementally implementing a simple ML compiler using LLVM firstname.lastname@example.org (Joachim Durchholz) (2007-11-29)|
|Re: Incrementally implementing a simple ML compiler using LLVM email@example.com (Jon Harrop) (2007-11-30)|
|From:||firstname.lastname@example.org (Torben =?iso-8859-1?Q?=C6gidius?= Mogensen)|
|Date:||Tue, 27 Nov 2007 10:43:16 +0100|
|Organization:||Department of Computer Science, University of Copenhagen|
Jon Harrop <email@example.com> writes:
> I recently tried the Low-Level Virtual Machine (LLVM) project and found that
> I can use its OCaml bindings to write native-code compilers with ease.
> I would like to use this technology to create a simple compiler for a
> language similar to OCaml but with no baggage (e.g. I have no desire
> to support 63-bit integers!).
63-bit (or 31-bit) integers are common in languages with GC, as the
last bit is used to distinguish integers from pointers. Full-size
integers carry an overhead, as they usually need to be boxed.
> However, I am completely new to this (I
> am a computational physicist/dabbler) so I'd really appreciate a
> little assistance.
> I already have a really bare-bones compiler for a first-order language
> with a single type (int) that can compile a Fibonacci program to
> x86-64 native code.
> I'm not even sure what my milestones should be but I assume I should
> add more types (e.g. function pointers), boxed values and a garbage
> collector next.
Sounds like a reasonable approach if you want an ML-like language.
But add them in such a fashion that you don't have to do them all over
when you add polymorphic types, closures, etc.
> Until now I have only been superficially aware of boxing (in the
> context of optimization). What exactly is the simplest run-time
> representation of a boxed value? Is it a pointer to a >=1 word-sized
> Is the Cheney semi-space GC a suitable starting point?
> Is there an abstract interface for using a GC from generated code
> that I could adhere to?
There are various ways of doing this, but most center around two
1. Identifying the root set, i.e., which words in registers and stack
locations that are pointers to the heap.
2. Identifying which words in a heap record are pointers to other
> Is there anything else that I should be aware of?
Lots of detail. Since you are familiar with ML, I suggest you read
Andrew Appel's "Modern Compiler Implementation in ML". But be sure to
go to his webpage (http://www.cs.princeton.edu/~appel/modern/ml/) to
get the list of misprints (there are some serious ones, even in the
I find the GC chapter quite good(lots of detail about how to interface
generated code and GC), the FPL chapter O.K. and the type inference
chapter questionable, but it should get you started.
Return to the
Search the comp.compilers archives again.