Related articles |
---|
Need advices for creating a proprietary language mike@heliovisionsproductions.fr (Mike) (1998-03-03) |
Re: Need advices for creating a proprietary language laheadle@cs.uchicago.edu (Lyn A Headley) (1998-03-06) |
Re: Need advices for creating a proprietary language sandy.harris@sympatico.ca (1998-03-06) |
Re: Need advices for creating a proprietary language joachim.durchholz@munich.netsurf.de (Joachim Durchholz) (1998-03-07) |
Re: Need advices for creating a proprietary language mc@hack.org (MC) (1998-03-07) |
Re: Need advices for creating a proprietary language markh@usai.asiainfo.com (Mark Harrison) (1998-03-07) |
Re: Need advices for creating a proprietary language nixon@softlab.se (Leif Nixon) (1998-03-12) |
Re: Need advices for creating a proprietary language tucny@km1.fjfi.cvut.cz (Ondrej Tucny) (1998-03-12) |
Re: Need advices for creating a proprietary language ddd@hplisolx.grenoble.hp.com (Dupont de Dinechin Christophe) (1998-03-15) |
Re: Need advices for creating a proprietary language byte@lmn.pub.ro (Laurentiu Badea) (1998-03-18) |
Re: Need advices for creating a proprietary language kochenbu@khe.siemens.de (Andreas Kochenburger) (1998-03-30) |
From: | Ondrej Tucny <tucny@km1.fjfi.cvut.cz> |
Newsgroups: | comp.compilers |
Date: | 12 Mar 1998 23:20:16 -0500 |
Organization: | Compilers Central |
References: | 98-03-039 |
Keywords: | design, practice |
Lyn A Headley wrote
>Writing a compiler is very hard. If you have never studied the
> ....
Are you sure? I think writing a compiler is one of the easiest things,
I've ever done. I think you need about one week to practise the basic
algorithms and then you should be able to write a simple compiler. But
yes, you're true if you're writing something like Ada. I wrote a
compiler of a language may be nearsy as complex as Ada and it took me
two years.
Here's my original reply to this message. Originaly I've sent it
directly to the author, because it's pretty large.
> I'm looking on the net since 3 days, and now I know what means the words:
> parser, token recognition,error recovery, semantic analysis, formal
language
> concepts, regular expressions, context-free grammars, top_down/bottom-up
> parsing, etc...
> But I'm submerged with url from american universities (xxxxxx.edu) that
> present the content of the compiler courses !!!
> I'm beeing driving crazy.
> And all the rest is about how to use BISON,LEX,YACC, and stuff like that.
> I don't think I must have to dive in such programs for creating my
> compilator ??
I think you're the first man I've ever met on the comp.compilers
newsgroup, who's not afraid to write a compiler. I mean real
programmers don't use yacc/lex/... and do all the work from scratch. I
think tools like yacc want solve all the problems and so become
totally unusable, but stop talking about stupid tools a let's write a
compiler:
STEP 1: defining the grammar of the language - the syntax rules
---------------------------------------------------------------
Basically, you write down what each part of a source code means, like
this:
Procedure ::= "procedure" Identifier [ Formal parameters ]
= { Declaration } "begin" { Statement } "end"
So what means what:
Procedure, Identifier, Formal parameters - syntactic categories
(symbols)
::= - just a symbol, kind of "assignment" (syntactic category
Procedure
is ...)
[ ] - means optional part (procedure
{ } - means repetition 0 to (nearly) infinity times
"procedure", "begin", ... - keywords
| - means or: a | b - there will be either a or b (not used here)
symbols are:
1) terminal - like keyword, Identifier, number - don't have
special internal structure and are recognized by the
lexical analyzer
2) non-terminal - like Procedure, Formal parameters, Statement - are
represented by a sequence of another non-terminals and terminals,
these are recognized by the syntactic analyzer
I'll define a simple expression and an assignment statement:
Expression ::= Simple expression [ Relational operator Simple expression
]
Relation operator ::= < | > | = | <> | <= | >=
Simple expression ::= Factor { Addition operator Factor }
Addition operator ::= + | -
Factor ::= Term { Multiplication operator Term }
Multiplication operator ::= * | div | mod
Term ::= Identifier | number | ( Expression )
Assignment statement ::= Identifier := Expression
Now you're ready to start coding;
STEP 2: writing procedures for lexical analysis
-----------------------------------------------
You'll new two procedures for working with the source code: getlex and
ungetlex. The first one retrieves a lexical element from the source
code, the second one will return it back "in the source". The code will
be like as follows:
type
-- representation of lexical elements
lexelem = enum
lex_eof, -- end of file
lex_id, -- identifier
lex_num, -- integer number
-- keywords
lex_begin, -- keyword "begin"
lex_end, -- keyword "end"
...
-- special symbols
lex_plul, -- "+"
lex_minus, -- "-"
lex_assign, -- ":="
...
end enum;
var
-- unget buffer
lastvalid : boolean := false;
last : lexelem;
procedure getlex return lexelem =
begin
if lastvalid
-- next lexical element was already read and is stored in the unget
buffer
then
lastvalid:=false;
result:=last;
-- get next lexical element from the source code
else
-- easy to implement, identifier starts with a letter, number with
a -- digit, spaces are ignored etc.
-- you might find useful writing procedures getchar and ungetchar
in
-- the same manner as (un)getlex is writen for reading the source
code
end if;
-- prepare for ungetlex
last:=result;
end getlex;
procedure ungetlex =
begin
lastvalid:=true;
end ungetlex;
Note: as you can see I've defined a lex_eof symbol representing the end
of file, this is quite useful, because you don't have to check for such
stupid errors.
This is all you need for comfortable work with the source code.
STEP 3: writing the compiler
----------------------------
You just turn all the syntactic rules into procedures and add semantic
rules and code generation.
First try writing a simple expression parser. All the other work is very
similar. Define a tree structure, that will represent the expression:
type
-- "operator"
expoper = enum
exp_id, -- identifier (=variable)
exp_num, -- number
exp_plus, -- addition
exp_minus, -- subtraction
...
end enum;
pnode = ^node;
node = record
exp : expoper; -- "operator" this node represents
left : pnode; -- left subtree
right : pnode; -- right subtree
...
end node;
you'll need a
procedure make_tree_a_left_subtree(t : ref pnode)
that will create a new node, assign t as its left subtree and return the
new node.
Finally write the code:
procedure Expression return pnode =
var
lex : lexelem;
begin
-- parse the first Simple expression
result:=Simple_expression;
-- is there a relational operator ?
lex:=getlex;
if lex in [lex_eq,lex_not_eq, ... ]
-- yes
then
-- prepare the tree
make_tree_a_left_subtree(result);
result^.oper:="exp_eq, exp_not_eq, ... regarding to lex";
-- parse the next Simple expression
result^.right:=Simple_expression;
-- no
else
-- return the lexical element back
ungetlex;
end if;
end Expression;
procedure Simple_expression return pnode =
var
lex : lexelem;
begin
-- parse the first Factor
result:=Factor;
-- try reading an additional operator
lex:=getlex;
while lex in [lex_plus,lex_minus] loop
-- prepare the tree
make_tree_a_left_subtree(result);
result^.oper:="exp_plus, exp_minus regarding to lex";
-- parse the next Factor
result^.right:=Factor;
-- try next operator
lex:=getlex;
end loop;
-- last element wasn't used, return back
ungetlex;
end Simple_expression;
..
procedure Term return pnode =
var
lex : lexelem;
begin
-- get next lexical element
lex:=getlex;
case lex of
-- identifier
when lex_id do result:="create a node exp_id";
-- number
when lex_num do result:="create a node exp_num";
-- expression in parenthesis
when lex_left_par do
-- parse expression
result:=Expression;
-- must be followed by the right parenthesis
lex:=getlex;
if lex<>lex_right_par then "error: e.g. raise an exception"; end
if;
-- no other elements are allowed here
when others do "error: e.g.. raise an exception";
end case;
end Term;
The code above transforms expression a<b*(c+d+1) into a tree:
<
+-------+-------+
a *
+-------+-------+
b +
+-------+-------+
+ 1
+-------+-------+
c d
Once you have this representation of an expression it's very easy to do
things like type compatibility checking, optimization (0*x --> 0) and
also code generation. You just evaluate the left subexpression than the
right subexpression and do the operation specified by the current node
imm <address of a>
load
imm <address of b>
load
imm <address of c>
load
imm <address of d>
load
add
imm 1
add
add
mul
less
I think this kind of evaluation is sometimes called "polish notation".
The instructions do the following:
imm - stores immediate value on the stack (e.g.. a number or
address)
load - loads variable on the stack (gets address from stack)
add - adds two numbers stored on the stack and pushes the result
on the stack
mul - like add, but performs multiplication
less - compares two numbers from stack a pushes boolean result
STEP 4: writing the interpreter
-------------------------------
you just get the instructions and perform the actions like:
instr:=getinstr;
case instr
when imm do push("parameter of the instruction")
when add do
pop(x);
pop(y);
push(x+y);
...
end case;
Note: implementing advanced features like SetAnimation: I don't
recommend you creating special purpose instructions, in future versions
it'll be hard to maintain. Better try using a construction like this:
procedure SetAnimation (name : string) internal 1024;
and create a special internal procedure calling instruction that gets
two parameters: an index of the internal procedure and length of
parameters,
so the operation will be as follows:
when internal do
internal_proc_handler("index of internal procedure",
"address of parameter block");
pop_bytes_from_stack("length of parameters");
Doing it this way you get a language/interpreter usable for all you
other products.
That's all the knowledge you need for writing a simple compiler.
>
> Any help would be welcome.
>
> Thanks a lot,
>
> Mike.
> Programmer,
> Heliovisions Productions,
> Lyon/France
Ondrej, mailto:Ondrej.Tucny@alsoft.cz
Programmer,
A&&L soft,
Prague/Czech Republic
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.