Re: Absolute beginner - Need some pointers

Hans-Peter Diettrich <>
Thu, 28 Feb 2008 11:38:48 +0100

          From comp.compilers

Related articles
Absolute beginner - Need some pointers (NickCarlson) (2008-02-27)
Re: Absolute beginner - Need some pointers (Hans-Peter Diettrich) (2008-02-28)
Re: Absolute beginner - Need some pointers (Bartc) (2008-02-29)
Re: Absolute beginner - Need some pointers (2008-03-02)
Re: Absolute beginner - Need some pointers (2008-03-03)
Re: Absolute beginner - Need some pointers (2008-03-04)
Re: Absolute beginner - Need some pointers (glen herrmannsfeldt) (2008-03-05)
Re: Absolute beginner - Need some pointers (Soeren Sandmann) (2008-03-07)
| List of all articles for this month |

From: Hans-Peter Diettrich <>
Newsgroups: comp.compilers
Date: Thu, 28 Feb 2008 11:38:48 +0100
Organization: Compilers Central
References: 08-02-091
Keywords: practice, interpreter
Posted-Date: 28 Feb 2008 10:45:21 EST

NickCarlson wrote:

Just a few notes:

> i. Write a lexical analyzer to convert to Omega code into a tree
> structure that the parser can parse.

This step usually is performed by a parser, that calls an lexer to
retrieve the tokens from the input text. The parser structures the
sequence of tokens into an tree (AST = Abstract Syntax Tree).

> 2. Write a parser to parse the tree structure into bytecode.

The code generator is not normally implemented as a parser. Details
like restructuring the AST, optimizations and storage of the binary
code, depend on your language and virtual machine.

Some semantical actions (building symbol tables and identification of
unique symbols...) can or must occur during the AST construction. In
your first tries you may implement multiple distinct passes over the
tree, each one for an specific task, and decide later to reduce the
number of passes by doing multiple jobs in a single pass.

> C. Write a virtual machine that can execute the bytecode.

An interpreter can immediately act on the preprocessed tree, without a
bytecode translation. E.g. an BASIC interpreter/emulator can have
specialized "instructions" (tree nodes) for a loop or an subroutine
call (CISC), where other more low-level machines will act on registers

> The problem is implementing the virtual machine. From my
> understanding, it's a lot like writing an emulator, except you get to
> choose what the opcodes are. Am I right here? From what I've read, I'd
> prefer to write a register-based interpreter as opposed to a stack
> based one, because the register based approach seems more intuitive to
> me.

Even a register based machine requires an stack, for subroutine calls
(recursion?) or evaluation of complex expressions. That's why most
(educational) approaches start with a stack machine, and deal with the
use of registers later (allocation, spilling, lifetime analysis...).
Debugging a stack machine can be easier, when the mapping of variables
into registers varies, for every subroutine or even within a single

> Can anyone give me a few tips on how to go about doing this? I'd like
> to start out as simple as possible. I've looked through Lua's and
> Parrot's source code, but it's more confusing that helpful at this
> point.

Each of your intended steps can be achieved in different ways. If you
have clear preferences, everything is fine. Otherwise you'll have to
decide which way to go, starting with the parser (handcrafted
recursive descent, or based on an existing parser generator), etc.

> [Unless you plan to save the bytecode to a file and reload it later,
> it's easier just to interpret the trees. -John]

Our famous mod again says it in a few words :-)


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.