Re: Modularize compiler construction?

Stephen Horne <>
Mon, 08 Feb 2010 22:14:52 +0000

          From comp.compilers

Related articles
[5 earlier articles]
Re: Modularize compiler construction? (Peng Yu) (2010-01-25)
Re: Modularize compiler construction? (Ira Baxter) (2010-01-28)
Re: Modularize compiler construction? (George Neuner) (2010-01-28)
Re: Modularize compiler construction? (Matthias-Christian Ott) (2010-01-31)
Re: Modularize compiler construction? (BGB / cr88192) (2010-01-31)
Re: Modularize compiler construction? (George Neuner) (2010-02-01)
Re: Modularize compiler construction? (Stephen Horne) (2010-02-08)
| List of all articles for this month |

From: Stephen Horne <>
Newsgroups: comp.compilers
Date: Mon, 08 Feb 2010 22:14:52 +0000
References: 10-01-080
Keywords: design
Posted-Date: 10 Feb 2010 11:03:34 EST

On Sat, 23 Jan 2010 17:10:38 -0600, Peng Yu <>

>It seems that the current compiler construction tools (at least in
>bison and flex) are still very primitive. Let's take the following
>example to explain what I mean.
>In the following book, let's say, section 6.1, mentioned various
>aspects of expression evaluation among many languages. If I want to
>construct a new language and its compiler by using a variety of
>features (e.g, whether to do expression arrangement or not as
>mentioned in 6.1.4) in these aspects, I don't see how to do so by
>easily composing different modules. It seems that there is a great
>semantic gap between what bison & flex offer and what compiler design

Having specialised tools for aspects of compiler development is useful
for at least two reasons...

1. You have the opportunity to swap out one tool for another.

2. You can exploit the tools even if you aren't writing a compiler
        for a general purpose language.

WRT the first point, one alternative to lex, for example, is ragel...

This supports a more sophisticated regular grammar model than lex, and
goes a little beyond what regular grammars support (e.g. it is
possible to handle nesting comments).

It is perfectly possible to use Ragel alongside yacc.

In relation to the second point, while Ragel is clearly designed
first-and-foremost for lexical analysis, that can be useful for
anything from a general purpose programming language, through
domain-specific languages, to input validation in virtually any
application. What's more, it's been adapted to tasks that aren't
lexical analysis - handling network protocols, for instance.

Although there are big names like lex and flex, yacc and bison, there
are also plenty of other tools out there, and not just for scanning
and parsing. An example is treecc, which generates (fairly simple)
code for AST nodes and multiple-dispatch operations on those nodes.

I wrote my own AST/multiple dispatch/etc tool and the basics took me a
couple of months of spare time. And it only took that long because I
made things more complex than they really needed to be. Since then, it
has gained a lot of added extras - features for generating
AST-traversing iterators and comparison functions, for instance.

This possibly hints at why there's no obvious "lex" or "yacc" for
these things. It's easy enough to grow your own that there's probably
a great many of them out there, lurking in particular companies or
projects - if not DSLs, then certainly libraries. And don't forget
that in some languages (e.g. Lisp, Objective CAML) the line between
DSL and library is very *very* thin.

There are toolkits that integrate scanning, parsing and AST

I've not used ANTLR, but I believe it covers those three areas.

For code generation and runtimes, the main choices seem to be
libraries rather than DSLs. There's LLVM and Parrot, for instance.
LLVM is basically compiler back-end for languages like C and C++.
Parrot is more a scripting-language agnostic virtual machine.

And of course don't forget the Java and .NET virtual machines.

I'm a fan of LLVM, even though I haven't fully figured it out yet. It
gives you a portable intermediate bytecode language, a readable
"virtual assembler" form, and a C++ library for code generation, with
tools to translate between forms etc. Apps can use LLVM as a
JIT-compiler or interpreter, as well as for fully optimised
compilation. There's a version of GCC adapted to use an LLVM back end,
and also the "clang" compiler built from scratch to use LLVM.

The "Kaleidoscope" tutorial steps you through writing a compiler for a
simple language, and is very easy to follow. The readable "assembler"
language is also pretty easy to understand.

The "pure" language uses LLVM...

Rakudo (a Perl 6 implementation) is a slightly dubious advert for

Basically, you can write a full compiler these days using your primary
programming language for little more than "glue", if that's what you
really want to do. And if that's still too much, well...

These are source transformation language, used to transform source
code from one language to another. The output "language" could just be
arbitrary calculated results. The guy who wrote Ragel also wrote an
extended form of TXL, and he used TXL to do it (his translator
translates ETXL to TXL).

Source transformation languages encapsulate scanning, parsing,
building and processing the AST, and generating output. A *little* bit
like XSLT, but the input doesn't need to be XML and so on.

What more could you want?

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.