Re: Course on code generation

Mike Spivey <mike@comlab.ox.ac.uk>
27 Mar 2001 23:44:08 -0500

          From comp.compilers

Related articles
Course on code generation yyaari@netvision.net.il (Yaakov Yaari) (2001-03-26)
Re: Course on code generation mike@comlab.ox.ac.uk (Mike Spivey) (2001-03-27)
Re: Course on code generation iank@idiom.com (2001-03-27)
| List of all articles for this month |

From: Mike Spivey <mike@comlab.ox.ac.uk>
Newsgroups: comp.compilers
Date: 27 Mar 2001 23:44:08 -0500
Organization: Oxford University, England
References: 01-03-125
Keywords: courses
Posted-Date: 27 Mar 2001 23:44:08 EST

"Yaakov Yaari" <yyaari@netvision.net.il> writes:


> Hi,
> I am planning a course on compiler code generation.


I've been giving some thought to such a course myself, as a follow-on
to a short compilers course I've been giving for several years. The
first course uses postfix code exclusively. My feeling is that's best
in a first course, as it allows us to concentrate on implementing
various language features. I've been using a dialect of ML as the
language for writing compilers and interpreters; they are all based on
AST's as the principal representation, and ML is nice for that because
we can describe the trees as an algebraic data type. The source
language for our final compiler is something close to Pascal. The
style is to build multi-pass compilers, so that we can concentrate on
one aspect of the process at a time.


We teach our second course on programming in Oxford using Oberon-2
with a home-grown bytecode compiler that was built using exactly the
techniques described in the course, so that helps with motivation. The
first programming course uses Haskell, so our students are very
comfortable with functional programming.


For the new course, I'm working with 'operator trees' like those used
in LCC. They are nice because various back-end transformations like
CSE can be described as transformations on trees, and you can generate
credible code for a RISC by top-down pattern matching on trees; ML is
marvellous for that. Although bottom-up tree matching is more
powerful, it doesn't really seem necessary for a RISC target, and it's
a bit more indirect & therefore less appropriate for a first attempt.
RISC machines have plenty of registers, so we can get a long way
before having to worry about complicating factors like spills.


The link between the first course and the second course is not too
hard: either write a small function that takes the postfix code and
produces equivalent operator trees, or (just as easy) rewrite the
intermediate code generator to produce trees directly. Starting from
the postfix code is not too bad, and is what JIT would do.


The final result will be a compiler with the following passes:


1. Lex and parse --> AST
2. Semantic analyis, annotate the AST
3. Generate postfix code
4. Peephole optimizer for postfix code
5. Build operator trees
6. CSE --> new operator trees
7. Tree optimizations
8. Code generation --> assembly language


Eight passes seems like a lot for a simple compiler, but the
attraction is that each pass does something simple, with well-defined
(and thanks to ML, strongly typed) interfaces between each pass and
the next.


About whether all this is suitable for undergraduates: I will stick
with a first course that uses postfix code, because it's more
important at first to see how language features are implemented on the
simplest machine than to deal with the complexities of real machines.
So the code generation course is very much a second course on
compilers.


-- Mike Spivey


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.