Re: Incrementally implementing a simple ML compiler using LLVM

torbenm@app-2.diku.dk (Torben =?iso-8859-1?Q?=C6gidius?= Mogensen)
Tue, 27 Nov 2007 10:43:16 +0100

          From comp.compilers

Related articles
Incrementally implementing a simple ML compiler using LLVM usenet@jdh30.plus.com (Jon Harrop) (2007-11-26)
Re: Incrementally implementing a simple ML compiler using LLVM torbenm@app-2.diku.dk (2007-11-27)
Re: Incrementally implementing a simple ML compiler using LLVM gneuner2@comcast.net (George Neuner) (2007-11-28)
Re: Incrementally implementing a simple ML compiler using LLVM pertti.kellomaki@tut.fi (=?ISO-8859-1?Q?Pertti_Kellom=E4ki?=) (2007-11-29)
Re: Incrementally implementing a simple ML compiler using LLVM jo@durchholz.org (Joachim Durchholz) (2007-11-29)
Re: Incrementally implementing a simple ML compiler using LLVM usenet@jdh30.plus.com (Jon Harrop) (2007-11-30)
| List of all articles for this month |

From: torbenm@app-2.diku.dk (Torben =?iso-8859-1?Q?=C6gidius?= Mogensen)
Newsgroups: comp.compilers
Date: Tue, 27 Nov 2007 10:43:16 +0100
Organization: Department of Computer Science, University of Copenhagen
References: 07-11-072
Keywords: ML, functional
Posted-Date: 27 Nov 2007 20:06:53 EST

Jon Harrop <usenet@jdh30.plus.com> writes:


> I recently tried the Low-Level Virtual Machine (LLVM) project and found that
> I can use its OCaml bindings to write native-code compilers with ease.
>
> I would like to use this technology to create a simple compiler for a
> language similar to OCaml but with no baggage (e.g. I have no desire
> to support 63-bit integers!).


63-bit (or 31-bit) integers are common in languages with GC, as the
last bit is used to distinguish integers from pointers. Full-size
integers carry an overhead, as they usually need to be boxed.


> However, I am completely new to this (I
> am a computational physicist/dabbler) so I'd really appreciate a
> little assistance.
>
> I already have a really bare-bones compiler for a first-order language
> with a single type (int) that can compile a Fibonacci program to
> x86-64 native code.
>
> I'm not even sure what my milestones should be but I assume I should
> add more types (e.g. function pointers), boxed values and a garbage
> collector next.


Sounds like a reasonable approach if you want an ML-like language.
But add them in such a fashion that you don't have to do them all over
when you add polymorphic types, closures, etc.


> Until now I have only been superficially aware of boxing (in the
> context of optimization). What exactly is the simplest run-time
> representation of a boxed value? Is it a pointer to a >=1 word-sized
> block?


Yes.


> Is the Cheney semi-space GC a suitable starting point?


Yes.


> Is there an abstract interface for using a GC from generated code
> that I could adhere to?


There are various ways of doing this, but most center around two
problems:


  1. Identifying the root set, i.e., which words in registers and stack
        locations that are pointers to the heap.


  2. Identifying which words in a heap record are pointers to other
        heap records.


> Is there anything else that I should be aware of?


Lots of detail. Since you are familiar with ML, I suggest you read
Andrew Appel's "Modern Compiler Implementation in ML". But be sure to
go to his webpage (http://www.cs.princeton.edu/~appel/modern/ml/) to
get the list of misprints (there are some serious ones, even in the
second printing).


I find the GC chapter quite good(lots of detail about how to interface
generated code and GC), the FPL chapter O.K. and the type inference
chapter questionable, but it should get you started.


Torben


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.