|Absolute beginner - Need some pointers email@example.com (NickCarlson) (2008-02-27)|
|Re: Absolute beginner - Need some pointers DrDiettrich1@aol.com (Hans-Peter Diettrich) (2008-02-28)|
|Re: Absolute beginner - Need some pointers firstname.lastname@example.org (Bartc) (2008-02-29)|
|Re: Absolute beginner - Need some pointers email@example.com (2008-03-02)|
|Re: Absolute beginner - Need some pointers firstname.lastname@example.org (2008-03-03)|
|Re: Absolute beginner - Need some pointers email@example.com (2008-03-04)|
|Re: Absolute beginner - Need some pointers firstname.lastname@example.org (glen herrmannsfeldt) (2008-03-05)|
|Re: Absolute beginner - Need some pointers email@example.com (Soeren Sandmann) (2008-03-07)|
|From:||firstname.lastname@example.org (Anton Ertl)|
|Date:||Sun, 02 Mar 2008 17:05:10 GMT|
|Organization:||Institut fuer Computersprachen, Technische Universitaet Wien|
|Posted-Date:||03 Mar 2008 13:56:59 EST|
NickCarlson <email@example.com> writes:
>(we'll call my new language Omega for now)
>i. Write a lexical analyzer to convert to Omega code into a tree
>structure that the parser can parse.
>2. Write a parser to parse the tree structure into bytecode.
>C. Write a virtual machine that can execute the bytecode.
>The problem is implementing the virtual machine. From my
>understanding, it's a lot like writing an emulator, except you get to
>choose what the opcodes are. Am I right here?
More or less. You can also freely choose the encoding, which allows
stuff like threaded code for performance.
>Can anyone give me a few tips on how to go about doing this?
Well, one approach you could do is to use Vmgen
<http://www.complang.tuwien.ac.at/anton/vmgen/> to generate the
virtual machine interpreter. Alternatively, you can read papers about
VM interpreters (e.g., the Vmgen paper linked from the site above, the
papers that it cites, and papers that cite it) and learn a lot about
how to do it yourself.
>[Unless you plan to save the bytecode to a file and reload it later,
>it's easier just to interpret the trees. -John]
I disagree. Using a VM has several advantages over directly
interpreting the tree:
- Modularization. A VM is a natural interface that allows decoupling
changes in the front end from changes in the interpreter. I have
seen this at work even in VMs that were internal to a specific
project, and where the same guy (me) maintained both parts.
This is especially important if the language still evolves (as will
be the case in this project): With a tree that reflects the source
code every change in the language affects a much larger piece of
code: the front end and the tree interpreter.
- Less code duplication: If there are two syntactic ways to express
the same concept, there are two kinds of trees for this concept, and
code for interpreting these two trees, whereas with a VM interpreter
the same VM instructions would be used for both syntaxes.
- Efficiency: VM code is simpler to implement and therefore faster to
One might get the modularization and code duplication advantages by
having a separate tree that does not reflect the syntax, but then you
probably need to generate a syntax tree as an intermediate step, and
any supposed simplicity of staying with a tree would go away; you
would just generate another tree as interpretable representation
rather than a linear VM. And the interpreter for that tree will be a
bit more complex (because the data structure is more complex) and
slower than a VM interpreter.
M. Anton Ertl
[I've found the painful part of bytecode is the flow control. I
suppose that if you do a goto-less version with codes like begin loop,
end loop, and break it would be similar to trees, but for me the trees
have the nice advantage that the block structure doesn't have to be
discovered, it's right there. -John]
Return to the
Search the comp.compilers archives again.