Re: Absolute beginner - Need some pointers (Anton Ertl)
Sun, 02 Mar 2008 17:05:10 GMT

          From comp.compilers

Related articles
Absolute beginner - Need some pointers (NickCarlson) (2008-02-27)
Re: Absolute beginner - Need some pointers (Hans-Peter Diettrich) (2008-02-28)
Re: Absolute beginner - Need some pointers (Bartc) (2008-02-29)
Re: Absolute beginner - Need some pointers (2008-03-02)
Re: Absolute beginner - Need some pointers (2008-03-03)
Re: Absolute beginner - Need some pointers (2008-03-04)
Re: Absolute beginner - Need some pointers (glen herrmannsfeldt) (2008-03-05)
Re: Absolute beginner - Need some pointers (Soeren Sandmann) (2008-03-07)
| List of all articles for this month |

From: (Anton Ertl)
Newsgroups: comp.compilers
Date: Sun, 02 Mar 2008 17:05:10 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
References: 08-02-091
Keywords: interpreter, comment
Posted-Date: 03 Mar 2008 13:56:59 EST

NickCarlson <> writes:
>(we'll call my new language Omega for now)
>i. Write a lexical analyzer to convert to Omega code into a tree
>structure that the parser can parse.
>2. Write a parser to parse the tree structure into bytecode.
>C. Write a virtual machine that can execute the bytecode.

Good plan.

>The problem is implementing the virtual machine. From my
>understanding, it's a lot like writing an emulator, except you get to
>choose what the opcodes are. Am I right here?

More or less. You can also freely choose the encoding, which allows
stuff like threaded code for performance.

>Can anyone give me a few tips on how to go about doing this?

Well, one approach you could do is to use Vmgen
<> to generate the
virtual machine interpreter. Alternatively, you can read papers about
VM interpreters (e.g., the Vmgen paper linked from the site above, the
papers that it cites, and papers that cite it) and learn a lot about
how to do it yourself.

>[Unless you plan to save the bytecode to a file and reload it later,
>it's easier just to interpret the trees. -John]

I disagree. Using a VM has several advantages over directly
interpreting the tree:

- Modularization. A VM is a natural interface that allows decoupling
    changes in the front end from changes in the interpreter. I have
    seen this at work even in VMs that were internal to a specific
    project, and where the same guy (me) maintained both parts.

    This is especially important if the language still evolves (as will
    be the case in this project): With a tree that reflects the source
    code every change in the language affects a much larger piece of
    code: the front end and the tree interpreter.

- Less code duplication: If there are two syntactic ways to express
    the same concept, there are two kinds of trees for this concept, and
    code for interpreting these two trees, whereas with a VM interpreter
    the same VM instructions would be used for both syntaxes.

- Efficiency: VM code is simpler to implement and therefore faster to

One might get the modularization and code duplication advantages by
having a separate tree that does not reflect the syntax, but then you
probably need to generate a syntax tree as an intermediate step, and
any supposed simplicity of staying with a tree would go away; you
would just generate another tree as interpretable representation
rather than a linear VM. And the interpreter for that tree will be a
bit more complex (because the data structure is more complex) and
slower than a VM interpreter.

- anton
M. Anton Ertl

[I've found the painful part of bytecode is the flow control. I
suppose that if you do a goto-less version with codes like begin loop,
end loop, and break it would be similar to trees, but for me the trees
have the nice advantage that the block structure doesn't have to be
discovered, it's right there. -John]

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.