Re: Why ML/OCaml are good for writing compilers

Amit Patel <amitp@Theory.Stanford.EDU>
13 Aug 1998 22:11:11 -0400

          From comp.compilers

Related articles
Why ML/OCaml are good for writing compilers (1998-07-28)
Re: Why ML/OCaml are good for writing compilers (Tony Bass) (1998-07-30)
Re: Why ML/OCaml are good for writing compilers amitp@Theory.Stanford.EDU (Amit Patel) (1998-08-13)
Re: Why ML/OCaml are good for writing compilers (Chris F Clark) (1998-08-16)
Re: Why ML/OCaml are good for writing compilers (Valentin Bonnard) (1998-08-16)
| List of all articles for this month |

From: Amit Patel <amitp@Theory.Stanford.EDU>
Newsgroups: comp.compilers
Date: 13 Aug 1998 22:11:11 -0400
Organization: Compilers Central
References: 98-07-220
Keywords: functional, practice (Dwight VandenBerghe) writes:

> So here is an unordered list of language features that seems to me
> to make writing compilers a pleasure rather than a horrendous chore.

[many things snipped]

> So it's mostly about data structures. ML is extraordinarily
> facile at allowing you to express complex data structures and
> recursive algorithms around them. The most basic data structures
> (lists, arrays, structs, unions, property lists, hash tables,
> binary trees, queues, and so on) are sitting there in the
> language already, well-implemented and ready to go. You start
> off from a place that you would have had to build up in another
> language.

I was one of three students in my compilers class that used ML instead
of C. I didn't know ML at the time, but I had to learn it and
compilers simultaneously. Some time later, I looked at what C
compiler code looked like, and I was *shocked*. It was incredibly
ugly in comparison to what I had seen in ML. The compiler I saw was
using a tree structure with child and right pointers, and if you
wanted to access (for example) the 'else' part of an IF statement,
you'd do something like:

      if(node->type==IF) {
          Node* else_part = node->child->right->right; /* EWWWWWW */

It didn't strike me as "right" at all - how do I know that an IF node
has the right number of subnodes? Why does the node for the TEST part
have a pointer to the node for the THEN part? Where are the
guarantees that the tree is well-formed? Where do you store the data?
Why do you store data for nodes that don't require it? Perhaps some
other structure would have worked better, but I haven't seen any parse
tree representation in C or C++ that's both type safe and feels "right".

ML's data structures (tagged unions) are *perfect* for representing
abstract syntax trees, but I think pattern matching is the real win
here. When I want to write an optimization step, I want to be able to
write something like "when you see THIS pattern, do THAT":

fun opt(Mult(ConstInt(0),x)) = ConstInt(0)
                    | opt(Mult(x,ConstInt(0))) = ConstInt(0)
                    | opt(Add(ConstInt(0),x)) = x
                    | opt(Add(x,ConstInt(0))) = x
                    | opt(anything) = anything

That's _much_ easier for me to write and read than something that has
to examine the child pointers manually, test the node types, extract
the value in an integer node, test that, and so on:

      if(node->type==MULT) {
            if(node->child->type==CONST_INT && node->child->val==0)
                  return node->child;
            if(node->child->right->type==CONST_INT && node->child->right->val==0)
                  return node->child->right;
      if(node->type==ADD) {
            if(node->child->type==CONST_INT && node->child->val==0)
                  return node->child->right;
            if(node->child->right->type==CONST_INT && node->child->right->val==0)
                  return node->child;

This code seems much more cluttered -- I can't see the tree structure
in it. The more complicated the pattern is, the worse it gets when
you don't have pattern matching. :(

- Amit
Amit J Patel, Computer Science Department, Stanford University

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.