Cm -- A toy compiler.

Yoshiyuki KONDO <cond@lsi-j.co.jp>
Mon, 23 Sep 91 23:16:17 JST

          From comp.compilers

Related articles
Cm -- A toy compiler. cond@lsi-j.co.jp (Yoshiyuki KONDO) (1991-09-23)
| List of all articles for this month |
Newsgroups: comp.compilers
From: Yoshiyuki KONDO <cond@lsi-j.co.jp>
Keywords: C, available, experiment
Organization: Compilers Central
Date: Mon, 23 Sep 91 23:16:17 JST

Last year I published a book on compiler writing (in Japanese and available
only in Japan. Anyway this posting is not an ad for my book:-). I have
written a simple one-pass toy compiler that compiles a subset of C to be
used as the material in my book.


The language processed by the compiler is called Cm (C minor). Cm is a
`toy' language because it lacks some very important features without which
it is impossible (or very difficult, at least) to write practical programs.


The difference between C and Cm are as follows:


        * struct, union, enum are not available. (Can you imagine how
            *hard* to write standard libraries without them?)
        * typedef is not available.
        * initializers are not available.
        * storage class specifiers are not available. (All variables defined
            inside functions are "auto" and all variables defined outside
            function are "extern").
        * only "char" and "int" are supported as arithmetic type.
        * local declaration does not nest. (Local variables can be declared
            only at the beginning of functions. No declaration in nested
            compound statement.)


These features are eliminated to keep the Cm compiler as small as possible
and easy to understand. In other words I do not omit any features essential
to study compiler writing. (Struct/union is essential to write programs but
I think it is less essential in compiler writing. Anyway I have to keep it
small, so something must be eliminated.)


Many aspects of compiler internals are covered in Cm. Arbitrary
combinations of pointer, array, and function can be used in declarations.
Short-cut operators (&&, ||) are expanded into combinations of test and
conditional branch (as real compilers do). The target machine is not a
stack oriented virtual machine. The target machine of Cm compiler is a
`real' one -- 80x86. (It's quite *real*, isn't it? :-)


The Cm compiler is written in ANSI C and Yacc, is about 4500 lines long
(it's pretty small, I think:-), generates assembler source code for 80x86 to
be assembled with MASM and linked with MS-LINK to get executable (.EXE)
files. The Cm compiler itself can be compiled by MS-C, Turbo-C, or LSI-C (a
product of my employer. Again, available only in Japan.). It operates in
an MS-DOS environment and runs on any MS-DOS machines including IBM PCs or
clones.


I hold the copyright of Cm compiler (eg. it is not in public domain) but it
is freely distributable by anyone. I have placed it in some domestic
commercial BBSs but is not currently available outside Japan. If there is
enough people interested in it, I will post it in an apropriate newsgroup.


---
Yoshiyuki KONDO email: cond%lsi-j.co.jp@uunet.uu.net
LSI Japan Co. Ltd., Tokyo, Japan cond@lsi-j.co.jp (from Japan)
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.