Bytecode Compiler (Chris Cranford)
21 Apr 2004 00:47:42 -0400

          From comp.compilers

Related articles
Bytecode Compiler (2004-04-21)
Re: Bytecode Compiler (2004-04-28)
Re: Bytecode Compiler (Nils M Holm) (2004-04-28)
Re: Bytecode Compiler (=?ISO-8859-1?Q?Cass=E9_Hugues?=) (2004-04-28)
Re: Bytecode Compiler (2004-04-28)
Re: Bytecode Compiler (2004-04-29)
Re: Bytecode Compiler (A Pietu Pohjalainen) (2004-05-02)
[1 later articles]
| List of all articles for this month |

From: (Chris Cranford)
Newsgroups: comp.compilers
Date: 21 Apr 2004 00:47:42 -0400
Organization: TKD Software, Inc.
Keywords: design
Posted-Date: 21 Apr 2004 00:47:42 EDT

If someone were to ask me to develop bytecode for "2+3", this would easily
make sense to me that I would:

    push 2
    push 3
    add // push(pop(tos)+pop(tos))

But when we begin tossing in the concept of variables and strings, things
begin to get complicated and hard to follow. Lets assume the following

    C/C++ BASIC
    void main(int argc, char* argv[])
    { Dim X as Integer = 3
        int x = 3, y = 2; Dim Y as Integer = 2
        printf("Sum is %i\n", (x+y)); Print "Sum is "; (x+y)

How should I generate bytecode to reference variables X and Y? Then to
assign the values of 3 and 2 to each variable respectively?

And finally, when the print/printf statement is executed, there is a string
involved that has to come from some place. Someone has mentioned in the past
to encode it as part of the opcode stream and another option is to use a
symbol table and reference it from there.

Could someone help me put together a quick opcode stream that would use
variables and strings like the above to help me grasp how I should generate
opcode sequences for a virtual machine?

[ shortly later he added: -John]

Assuming we're dealing with a language simliar to BASIC with the following
program, I have come up with the following BYTECODE stream:

    Dim x as Integer = 2
    Dim y as Integer = 4
    x = y + x
    y = x + 1 * 2

    DIM Statements
    0001 ipush 00 02 // pushes 2 to TOS
    0004 istore_0 // pushes TOS to local variable slot 0 (x)
    0005 ipush 00 04 // pushes 4 to TOS
    0008 istore_1 // pushes TOS to local variable slot 1 (y)

    First Math Statement
    0009 iload_0 // pushes local variable slot 0 (x) to TOS
    0010 iload_1 // pushes local variable slot 1 (y) to TOS
    0011 iadd // performs integer addition push(pop()+pop())
    0012 istore_0 // pushes TOS to local variable slot 0 (x)

    Second Math Statement
    0013 ipush 00 02 // pushes 2 to TOS
    0016 ipush 00 01 // pushes 1 to TOS
    0019 imult // integer multiplication push(pop()*pop())
    0020 iload_0 // pushes local variable slot 0 (x) to TOS
    0021 iadd // performs integer addition push(pop()+pop())
    0022 istore_0 // pushes TOS to local variable slot 0 (x)

Now, in the above, I took a few things for granted that has to be done
before the virtual machine can ever begin executing any of these
statements. Something has to tell the virtual machine to advance the
stack pointer by the number of local variable storage slots that are
needed. This again could just be a simple opcode like:

    0000 advsp_2 // Advances the stack pointer by 2 slots. This
                                                  // basically gives me two local variable slots.

Another option would be to assume a maximum number of slots for local
vars like the java virtual machine does of 255. Is there any PROs to
this? I would see this as being wasteful, especially for the case
where only a handful of variables are being used like the above
example. More memory would have to be allocated for the unused stack
slots that necessary. It could be the job of the compiler to look at
the symbol table for each code block and determine the number of slots
needed for that frame. Then, just use either:

    advsp_1 // 1 variable slot
    advsp_2 // 2 variable slots
    advsp_3 // 3 variable slots
    advsp [2 byte operand] // variable slots specified by operand 2 bytes

This all makes sense to me so far. :-) Now so far I've assumed working
with numbers and these numbers are encoded in the opcode bytestream as
operands to the opcodes that use them.

Now, time for my questions ~~ I just wanted to illustrate my
understanding thus far in case there is some misunderstanding so that
someone can clarify for me.

Lets now assume we have the following source code:

    Dim s as String = "Test"

I have several options on how I could implement this. I could use a
special opcode that says, the following byte stream represents a
string of ASCII characters which is terminated by the first NULL byte
found. Take this stream, allocate a memory pointer for it and put the
stream in the memory location. Then store the pointer on the stack.

    OPCODE 54 65 73 74 00

Or, I could simply this to permit NULL bytes in the stream by using something

    OPCODE [2 byte length] 54 65 73 74

Or another alternative approach would be a special opcode that tells the VM to
go to a special block of storage in the BYTECODE file known as the DATA BLOCK
and extract X number of bytes and store these in memory and also push the
memory pointer on the stack.

    OPCODE [2 byte length] [2 byte data block offset]

    Address # 0000 : 54657374 00000000 00000000 00000000
    Address # 0020 : 00000000 00000000 00000000 00000000

So the associated opcode would be: [OPCODE 00 04 00 00]

Now, once the memory pointer has been placed on the TOS, there would be then
an opcode that tells the VM to store the TOS in the local variable space of
the stack frame at offset 0.

    pstore_0 // Stores TOS in local variable slot 0 (s) [for pointers]
    pload_0 // Loads local variable slot 0 (s) into TOS [for pointers]

Then, I need to come up with a set of opcodes which permit working with memory
pointers, right?

Thanks in advance!

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.