Re: Need advices for creating a proprietary language

Ondrej Tucny <tucny@km1.fjfi.cvut.cz>
12 Mar 1998 23:20:16 -0500

          From comp.compilers

Related articles
Need advices for creating a proprietary language mike@heliovisionsproductions.fr (Mike) (1998-03-03)
Re: Need advices for creating a proprietary language laheadle@cs.uchicago.edu (Lyn A Headley) (1998-03-06)
Re: Need advices for creating a proprietary language sandy.harris@sympatico.ca (1998-03-06)
Re: Need advices for creating a proprietary language joachim.durchholz@munich.netsurf.de (Joachim Durchholz) (1998-03-07)
Re: Need advices for creating a proprietary language mc@hack.org (MC) (1998-03-07)
Re: Need advices for creating a proprietary language markh@usai.asiainfo.com (Mark Harrison) (1998-03-07)
Re: Need advices for creating a proprietary language nixon@softlab.se (Leif Nixon) (1998-03-12)
Re: Need advices for creating a proprietary language tucny@km1.fjfi.cvut.cz (Ondrej Tucny) (1998-03-12)
Re: Need advices for creating a proprietary language ddd@hplisolx.grenoble.hp.com (Dupont de Dinechin Christophe) (1998-03-15)
Re: Need advices for creating a proprietary language byte@lmn.pub.ro (Laurentiu Badea) (1998-03-18)
Re: Need advices for creating a proprietary language kochenbu@khe.siemens.de (Andreas Kochenburger) (1998-03-30)
| List of all articles for this month |
From: Ondrej Tucny <tucny@km1.fjfi.cvut.cz>
Newsgroups: comp.compilers
Date: 12 Mar 1998 23:20:16 -0500
Organization: Compilers Central
References: 98-03-039
Keywords: design, practice

Lyn A Headley wrote
>Writing a compiler is very hard. If you have never studied the
> ....


Are you sure? I think writing a compiler is one of the easiest things,
I've ever done. I think you need about one week to practise the basic
algorithms and then you should be able to write a simple compiler. But
yes, you're true if you're writing something like Ada. I wrote a
compiler of a language may be nearsy as complex as Ada and it took me
two years.




Here's my original reply to this message. Originaly I've sent it
directly to the author, because it's pretty large.


> I'm looking on the net since 3 days, and now I know what means the words:
> parser, token recognition,error recovery, semantic analysis, formal
language
> concepts, regular expressions, context-free grammars, top_down/bottom-up
> parsing, etc...
> But I'm submerged with url from american universities (xxxxxx.edu) that
> present the content of the compiler courses !!!
> I'm beeing driving crazy.
> And all the rest is about how to use BISON,LEX,YACC, and stuff like that.
> I don't think I must have to dive in such programs for creating my
> compilator ??


I think you're the first man I've ever met on the comp.compilers
newsgroup, who's not afraid to write a compiler. I mean real
programmers don't use yacc/lex/... and do all the work from scratch. I
think tools like yacc want solve all the problems and so become
totally unusable, but stop talking about stupid tools a let's write a
compiler:


STEP 1: defining the grammar of the language - the syntax rules
---------------------------------------------------------------


Basically, you write down what each part of a source code means, like
this:


                  Procedure ::= "procedure" Identifier [ Formal parameters ]
                                                      = { Declaration } "begin" { Statement } "end"


So what means what:
        Procedure, Identifier, Formal parameters - syntactic categories
(symbols)
        ::= - just a symbol, kind of "assignment" (syntactic category
Procedure
                is ...)
        [ ] - means optional part (procedure
        { } - means repetition 0 to (nearly) infinity times
        "procedure", "begin", ... - keywords
        | - means or: a | b - there will be either a or b (not used here)




symbols are:
        1) terminal - like keyword, Identifier, number - don't have
special internal structure and are recognized by the
lexical analyzer
        2) non-terminal - like Procedure, Formal parameters, Statement - are
              represented by a sequence of another non-terminals and terminals,
              these are recognized by the syntactic analyzer


I'll define a simple expression and an assignment statement:


Expression ::= Simple expression [ Relational operator Simple expression
]
Relation operator ::= < | > | = | <> | <= | >=
Simple expression ::= Factor { Addition operator Factor }
Addition operator ::= + | -
Factor ::= Term { Multiplication operator Term }
Multiplication operator ::= * | div | mod
Term ::= Identifier | number | ( Expression )


Assignment statement ::= Identifier := Expression


Now you're ready to start coding;




STEP 2: writing procedures for lexical analysis
-----------------------------------------------


You'll new two procedures for working with the source code: getlex and
ungetlex. The first one retrieves a lexical element from the source
code, the second one will return it back "in the source". The code will
be like as follows:


type
    -- representation of lexical elements
    lexelem = enum
        lex_eof, -- end of file
        lex_id, -- identifier
        lex_num, -- integer number
        -- keywords
        lex_begin, -- keyword "begin"
        lex_end, -- keyword "end"
        ...
        -- special symbols
        lex_plul, -- "+"
        lex_minus, -- "-"
        lex_assign, -- ":="
        ...
        end enum;


var
    -- unget buffer
    lastvalid : boolean := false;
    last : lexelem;


procedure getlex return lexelem =
begin
    if lastvalid
        -- next lexical element was already read and is stored in the unget
buffer
        then
            lastvalid:=false;
            result:=last;


        -- get next lexical element from the source code
        else
            -- easy to implement, identifier starts with a letter, number with
a -- digit, spaces are ignored etc.
            -- you might find useful writing procedures getchar and ungetchar
in
            -- the same manner as (un)getlex is writen for reading the source
code
        end if;


    -- prepare for ungetlex
    last:=result;
    end getlex;


procedure ungetlex =
begin
    lastvalid:=true;
    end ungetlex;


Note: as you can see I've defined a lex_eof symbol representing the end
of file, this is quite useful, because you don't have to check for such
stupid errors.


This is all you need for comfortable work with the source code.




STEP 3: writing the compiler
----------------------------


You just turn all the syntactic rules into procedures and add semantic
rules and code generation.


First try writing a simple expression parser. All the other work is very
similar. Define a tree structure, that will represent the expression:




type
    -- "operator"
    expoper = enum
          exp_id, -- identifier (=variable)
          exp_num, -- number
          exp_plus, -- addition
          exp_minus, -- subtraction
          ...
          end enum;


    pnode = ^node;
    node = record
          exp : expoper; -- "operator" this node represents
          left : pnode; -- left subtree
          right : pnode; -- right subtree
          ...
          end node;


you'll need a
        procedure make_tree_a_left_subtree(t : ref pnode)
that will create a new node, assign t as its left subtree and return the
new node.




Finally write the code:


procedure Expression return pnode =


var
    lex : lexelem;


begin
    -- parse the first Simple expression
    result:=Simple_expression;


    -- is there a relational operator ?
    lex:=getlex;
    if lex in [lex_eq,lex_not_eq, ... ]
        -- yes
        then
            -- prepare the tree
            make_tree_a_left_subtree(result);
            result^.oper:="exp_eq, exp_not_eq, ... regarding to lex";


            -- parse the next Simple expression
            result^.right:=Simple_expression;


        -- no
        else
            -- return the lexical element back
            ungetlex;
        end if;
    end Expression;


procedure Simple_expression return pnode =


var
    lex : lexelem;


begin
    -- parse the first Factor
    result:=Factor;


    -- try reading an additional operator
    lex:=getlex;


    while lex in [lex_plus,lex_minus] loop
        -- prepare the tree
        make_tree_a_left_subtree(result);
        result^.oper:="exp_plus, exp_minus regarding to lex";


        -- parse the next Factor
        result^.right:=Factor;


        -- try next operator
        lex:=getlex;
        end loop;


    -- last element wasn't used, return back
    ungetlex;
    end Simple_expression;


..


procedure Term return pnode =


var
    lex : lexelem;


begin
    -- get next lexical element
    lex:=getlex;


    case lex of
        -- identifier
        when lex_id do result:="create a node exp_id";


        -- number
        when lex_num do result:="create a node exp_num";


        -- expression in parenthesis
        when lex_left_par do
                -- parse expression
                result:=Expression;
                -- must be followed by the right parenthesis
                lex:=getlex;
                if lex<>lex_right_par then "error: e.g. raise an exception"; end
if;


        -- no other elements are allowed here
        when others do "error: e.g.. raise an exception";
        end case;
    end Term;


The code above transforms expression a<b*(c+d+1) into a tree:


                              <
              +-------+-------+
              a *
                              +-------+-------+
                              b +
                                              +-------+-------+
                                              + 1
                              +-------+-------+
                              c d


Once you have this representation of an expression it's very easy to do
things like type compatibility checking, optimization (0*x --> 0) and
also code generation. You just evaluate the left subexpression than the
right subexpression and do the operation specified by the current node


        imm <address of a>
        load
        imm <address of b>
        load
        imm <address of c>
        load
        imm <address of d>
        load
        add
        imm 1
        add
        add
        mul
        less


I think this kind of evaluation is sometimes called "polish notation".
The instructions do the following:


        imm - stores immediate value on the stack (e.g.. a number or
address)
        load - loads variable on the stack (gets address from stack)
        add - adds two numbers stored on the stack and pushes the result
on the stack
        mul - like add, but performs multiplication
        less - compares two numbers from stack a pushes boolean result




STEP 4: writing the interpreter
-------------------------------


you just get the instructions and perform the actions like:


        instr:=getinstr;
        case instr
            when imm do push("parameter of the instruction")
            when add do
                    pop(x);
                    pop(y);
                    push(x+y);
            ...
            end case;


Note: implementing advanced features like SetAnimation: I don't
recommend you creating special purpose instructions, in future versions
it'll be hard to maintain. Better try using a construction like this:


          procedure SetAnimation (name : string) internal 1024;


and create a special internal procedure calling instruction that gets
two parameters: an index of the internal procedure and length of
parameters,
so the operation will be as follows:


          when internal do
                  internal_proc_handler("index of internal procedure",
                          "address of parameter block");
                  pop_bytes_from_stack("length of parameters");


Doing it this way you get a language/interpreter usable for all you
other products.




That's all the knowledge you need for writing a simple compiler.


>
> Any help would be welcome.
>
> Thanks a lot,
>
> Mike.
> Programmer,
> Heliovisions Productions,
> Lyon/France


Ondrej, mailto:Ondrej.Tucny@alsoft.cz
Programmer,
A&&L soft,
Prague/Czech Republic
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.