Re: Making my first compiler

Pascal Bourguignon <>
18 Sep 2006 09:44:44 -0400

          From comp.compilers

Related articles
Making my first compiler (2006-09-16)
Re: Making my first compiler (Pascal Bourguignon) (2006-09-18)
Re: Making my first compiler (Tommy Thorn) (2006-09-18)
Re: Making my first compiler ( (2006-09-18)
Re: Making my first compiler (2006-09-18)
Re: Making my first compiler (Jeff Kenton) (2006-09-25)
Re: Making my first compiler (Peter \Firefly\Lund) (2006-09-25)
Re: Making my first compiler (Stefan Monnier) (2006-11-23)
[9 later articles]
| List of all articles for this month |

From: Pascal Bourguignon <>
Newsgroups: comp.compilers
Date: 18 Sep 2006 09:44:44 -0400
Organization: Informatimago
References: 06-09-087
Keywords: design
Posted-Date: 18 Sep 2006 09:44:44 EDT writes:

> Hi,
> I'm trying to create a pascal subset interpreter/compiler.
> Do I HAVE to create a syntax tree?

No, it needs not be explicit.

> Or can I go straight to creating Intermediate Code (Quadruples) in
> Yacc's reduce actions?

Yes, you might try that.

However, unless your language is simplistic, you'll probably want to
build the syntax tree to preprocess it befoer generating the
quadruples. In particular, if you need to do semantic analysis, like
checking the types, or other global analysis before the tuple

For example, you could have in a random language:

    var a,b,c:integer;
          var a,b,c:string;
          a:="hello ";
          c:=a+b; /*1*/
    c:=a+b; /*2*/

The tuples you generate for the statement /*1*/ might be different
from the tuple you generate for the statement /*2*/ because they work
on different type of data, even if the productions used to derive them
are exactly the same. You can make the difference only considering
the global parse tree, where you can see that for /*1*/, the variables
are lexically strings, while for /*2*/ they're lexically integers.

> Say I have a generic paramater_list (for function definitions) in BNF ,
> eg:
> paramater_list: paramater_list paramater
> In Yacc, would this be connected together as a linked-list?
> paramater_list: paramater_list paramater { $$ = make_paramater($2);
> $$->next = $1; }
> | paramater { $$ = make_paramater($1); }
> ;
> What's a basic structure for a syntax tree in C?

Whatever you want.

You can write specific struct types for each kind of node, or you can
write a generinc kind of node, and build dynamic trees of them, like
in lisp.

For example for a parameter such as:

parameter : ['in'|'out'|'inout'] name ':' type .

you could define a specific structure like:

typedef enum {mode_in,mode_out,mode_inout} mode_t;
typedef struct {
        ident_t* name;
        tipe_t* tipe;
        mode_t mode;
} parameter_t;

and so on for every kind of node.

Or you can just define:

typedef struct {
        object_t* car;
        object_t* cdr;
} const_t;

typedef enum { kind_integer,kind_string,kind_symbol,kind_cons /* ... */ } kind_t;
typedef struct {
      kind_t kind;
      union {
              int integer;
              char* string;
              const char* symbol;
              const_t* pair;
              /* ... */
    } data;
} object_t;

once for all and use them to build any kind of node.
For a parameter:


for a list of parameters:


etc... There are libraries for this kind of generic "lisp" objects in
C and C++.

> Say I want my compiler to convert Pascal code to C code, which steps
> are absolutely necessary to do this correctly?
> Im thinking:
> Build Syntax Tree -> Convert it to Three Address Code (Quadruples) ->
> then into C
> Was this correct?

No, to translate to another high level language, I wouldn't go down to
tuples. I'd stay at the level of the syntax tree. I'd process it in
several phases to transform it in such a way that a simple walk of the
tree can be used to directly generate the target code. For example,
the main difference between pascal and (standard) C, is that in pascal
you can have local functions and procedures.

So starting from a tree like:

  :name "pgm"
            :name "p1" :arguments (...)
            :locals ((var :name "v1" :type "t1")
                              (procedure :name "p2" :arguments (...) :locals (...) :body (...)))
              :body (... (call "p2" ...) ...)))
  :body (...))

you'll need to transform it like:

  :name "pgm"
        ((procedure :name "p1_p2" :arguments (... p1_p2_pars)
                                :locals (...) :body (...))
            :name "p1" :arguments (...)
            :locals ((var :name "v1" :type "t1"))
            :body (... (call "p2" ... p1_p2_pars) ...)))
  :body (...))

moving the local procedures to the toplevel, changing their names to
avoid collisions, and adding a hiden parameter to pass the lexical
scope when the local proceude makes reference to local variables of
the enclosing procedures.

Other transforms may deal with the type system, parameter modes, etc.

Once you have a "normalized" tree, you can directly generate C code
from it.

__Pascal Bourguignon__
I need a new toy.
Tail of black dog keeps good time.
Pounce! Good dog! Good dog!

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.