Re: Basic-Like PCODE Compiler and Virtual Host/Interpreter (Paolo Bonzini)
21 Apr 2004 00:45:49 -0400

          From comp.compilers

Related articles
Basic-Like PCODE Compiler and Virtual Host/Interpreter (2004-04-15)
Re: Basic-Like PCODE Compiler and Virtual Host/Interpreter (Timothy Knox) (2004-04-21)
Re: Basic-Like PCODE Compiler and Virtual Host/Interpreter (2004-04-21)
Re: Basic-Like PCODE Compiler and Virtual Host/Interpreter (Lex Spoon) (2004-04-21)
Re: Basic-Like PCODE Compiler and Virtual Host/Interpreter (2004-04-21)
Basic-Like PCODE Compiler and Virtual Host/Interpreter (2004-04-21)
Re: Basic-Like PCODE Compiler and Virtual Host/Interpreter (2004-04-28)
Re: Basic-Like PCODE Compiler and Virtual Host/Interpreter (2004-04-28)
Re: Basic-Like PCODE Compiler and Virtual Host/Interpreter (Ken Rose) (2004-04-28)
[6 later articles]
| List of all articles for this month |

From: (Paolo Bonzini)
Newsgroups: comp.compilers
Date: 21 Apr 2004 00:45:49 -0400
References: 04-04-041
Keywords: interpreter
Posted-Date: 21 Apr 2004 00:45:49 EDT (JA Peele) wrote in message news:04-04-041...
> I am looking for some hints/tips or possibly source if its available
> (even as a guide) to explain how I can complete the following tasks.
> [snip]
> What I am expecting to do is have the customer save [a script] in a
> file, for example called test.bas. The customer would have a tool
> that does syntax, compiler analysis on the code and would notify them
> of where syntax errors exist in the code.

Virtual machines and Little Languages 101

Start with lex and yacc (or flex and bison) and put together a lexical
scanner and a parser, working towards producing a recognizer for Basic
syntax. No semantic analysis, in other words

      Dim b As Integer, a As Integer
      a = Str(b)

would still be accepted, while

      Dim b As Integer, a As Integer
      a = b + + a

would be rejected. You can code this in C; otherwise if you want to
use C++, look in the archives for comp.compilers and you'll surely
find help. It helps a lot that you don't have in practice to do any
memory management in this tool, just allocate everything and have it
deleted at the end of the program.

Create classes for the symbol table and the symbols (variables,
functions, ...) and add rules to the grammar that populate the symbol

Now create classes for the syntax tree and add rules to the bison
grammar that create these classes. That's just a set of classes that
more or less remember the reductions made by the parser (that means
how the parser took the small parts and combined them to form the big
picture of a complete program).

Add more code to the rules that populated the symbol table, so that
each symbol is attached to the corresponding syntax tree: this way
you'll be able to navigate from a function usage to its definition.

Implement a Visitor pattern for the syntax tree. That means you will
use double dispatch:

class SyntaxTreeElement {
    virtual void acceptVisitor (visitor *v) = 0;

class BinOpElement : public SyntaxTreeElement {
    void accept_visitor (visitor *i) {
        i->do_visit (this);

[... similar for other subclasses of SyntaxTreeElement ...]

class Visitor {
    void visit (SyntaxTreeElement *elem) {
        elem->accept_visitor (this);
    virtual void do_visit (VariableElement *var) {
    virtual void do_visit (BinOpElement *binop) {
        this->visit (this->op1);
        this->visit (this->op2);
    virtual void do_visit (WhileLoopElement *loop) {
        this->visit (this->condition);
        this->visit (this->body);

Always call through visit, not do_visit (in case you wonder, this is
in order to handle correctly the case when the syntax tree class
hierarchy is deep, i.e. there are subclasses of subclasses of

Visitor is an abstract class, but its methods are not pure; instead
they implement the actual navigation of the tree. For leaves such as
VariableElement, the visit method will be empty.

Use one or more visitors to implement semantic analysis (type
checking, number of arguments to a function, etc.).

> If all the syntatical
> checks pass, the contents of the source file are then stored in a
> bytecode/pcode type file like test.pbc.

Do you really want bytecode/pcode? You may want instead to serialize
the objects and load them back. But if you want bytecode, go for a
stack machine. Plan (quite carefully) the bytecodes and the in-memory
layout of the virtual machine's stack, and create another visitor that
navigates post-order (i.e. leaves first) through the syntax tree for
each function (and for the main function) and creates bytecodes.
You'll have to add extra code to deal with jumps too if you use

Bytecodes are faster (up to 5-10x), but harder to write. Even Ruby
does not use bytecodes, but walks the syntax tree. If most of the
time you'll spend in the interpreter will actually be in C++ coded
system functions (it depends on what your language is for), go with
syntax tree serialization.

If you decide to serialize the syntax tree, do another small pass in
the compiler that computes the size of a function's stack frame and
allocates each variable to the slot in the stack frame. This will
considerably simplify the implementation of the interpreter.

> const MAX_MEMORY_SIZE=1024*32; // 32k
> CBasicVM* pVM = new CBasicVM(MAX_MEMORY_SIZE);
> if (pVM != NULL)
> {
> // args is a character string of arguments that are passed
> // into the host VM environment as parameters to the program
> // being executed.
> // ie: test.pbc [arg1] [arg2] [arg3] ... [argX]
> if(pVM->Run("test.pbc", args))
> printf("Virtual machine ran program ok.\n");
> else
> printf("Virtual machine detected errors during runtime.\n");

Side remark, why not

      CBasicVM* pVM = new CBasicVM(MAX_MEMORY_SIZE, "test.pbc", args);
              if (pVM->Run())
                  printf("Virtual machine ran program ok.\n");
                  printf("Virtual machine detected errors during runtime.\n");
      delete pVM;

Anyway if you settled with the bytecode approach, the interpreter
should be a big switch statement that takes bytecodes from ip and data
from sp. To implement subroutines, just save the state of the VM (the
ip) on the stack, frob the ip, and the interpreter will happily start
executing the new funciton without realizing that it has changed. To
implement internal functions (such as Print or Str$), you can have a
"primitive" bytecode whose argument is an id number for the function.

If you settled with serializing the syntax tree, you'll have to do
another visitor. This is the actual interpreter:

class Interpreter: public Visitor {
    /* Store result of evaluating each node. That's because
          Visitor returns a void. */
    Variant computedValue;
    Variant *stackFrame;

    /* Reference trick to simplify the code below. */
    Variant &eval (SyntaxTreeElement *elem) {
        this->visit (elem);
        return this->computedValue;
    void do_visit (VariableElement *var) {
        this->computedValue = this->stackFrame[var->index];
    void do_visit (BinOpElement *binop) {
        Variant val1 = this->eval (binop->op1);
        Variant val2 = this->eval (binop->op2);
        switch (binop->op) {
            case PLUS: this->computedValue = val1 + val2;
            case MINUS: this->computedValue = val1 - val2;
        abort ();

    void do_visit (WhileLoopElement *loop) {
      while (this->eval (loop->condition))
          /* Result not interesting, use this->visit. */
          this->visit (loop->body);

and so on. I'm not a C++ programmer, so I don't know much about
references and that stuff. I use Smalltalk typically, that's why I
always use this-> in front of method names. So take this code with a
grain of salt. Have fun!


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.