|Maintainable compiler design? firstname.lastname@example.org (=?ISO-8859-1?Q?Christoffer_Lern=F6?=) (2009-07-27)|
|Re: Maintainable compiler design? email@example.com (Michiel) (2009-07-29)|
|Re: Maintainable compiler design? firstname.lastname@example.org (BGB / cr88192) (2009-07-29)|
|Re: Maintainable compiler design? email@example.com (Martin Ward) (2009-08-06)|
|Re: Maintainable compiler design? firstname.lastname@example.org (James Harris) (2009-08-06)|
|Date:||Wed, 29 Jul 2009 10:12:08 -0700 (PDT)|
|Posted-Date:||29 Jul 2009 23:00:21 EDT|
Christoffer Lernv wrote:
> Are there any best practices to keep the compiler easy to understand
> and well organized?
Most compilers I know make several passes through the code. My first
compiler did something like this:
* Lexer + Parser to create AST
The rest done by walking the AST:
* Building the symbol table
* Resolving references to symbols
* Type checking
* Checking side-effects, access-rights, etc.
* Higher level optimization
* Intermediate code generation
Though in my new version I'm doing things a bit differently. It may be
an alternative solution for you. (I'd like some feedback myself, by
The information I once calculated during specific passes (symbols,
scopes, resolved references, types, side-effects, access-rights,
etc.), I now made lazy properties of the relevant AST nodes. For
example, when I want to know the type of an expression, I'll just ask
the expression node and it will, only once, compute its own type by
querying its subexpressions, and so on.
This has several advantages for me:
* My programming language has some features that require an approach
such as this. It has function overloading, which means I need to know
the types of the actual parameters before I can be sure which specific
function is referenced. On the other hand, before I can know the type
of a function-call, I have to first resolve the function reference.
That's not easy (possible?) to do using a fixed number of compiler
passes, because the programmer can make his expression arbitrarily
complicated. I'm also planning to add the 'auto' keyword to get
automatic type deduction for variable declarations. This requires that
I know the type of the initialization value before I can add the
variable to the symbol table.
* If some information is never needed, it need never be computed.
* I find it comforting to know that whenever I need a piece of
information, I can just query the relevant property, without worrying
if I've already done that pass.
Of course, this approach introduces some overhead, but I find the
benefits far outweigh that disadvantage.
It is of course possible to get a cyclic data dependency this way
(mostly because of automatic type deduction, I guess). But this is
always a mistake, either of the compiler (e.g. you forgot to report an
error when a variable is referenced before it is declared, possibly
resulting in two auto-typed variables initializing each other) or of
its user (e.g. two functions, whose textual order is irrelevant, use
each other in automatic type deduction).
To catch both kinds of mistake, the automatically computed properties
carry a 'computing' flag. The first time their value is queried, they
turn it on before starting computation. If it was already on, that
means there was a cyclic dependency and the compiler will either
report this to you by throwing an exception or to the user by
reporting an appropriate error-message.
Building the AST, searching it for most errors, optimizing it and
generating code are still compiler passes in this design.
To be honest, I'm not sure if a design like this has been used before.
Return to the
Search the comp.compilers archives again.