Need Information on how to create bytecode

Oliver Hunt <ojh16@student.canterbury.ac.nz>
9 Apr 2006 17:23:01 -0400

          From comp.compilers

Related articles
Need Information on how to create bytecode megavlad@gmail.com (megavlad@gmail) (2006-04-08)
Re: Need Information on how to create bytecode Juergen.Kahrs@vr-web.de (=?ISO-8859-1?Q?J=FCrgen_Kahrs?=) (2006-04-09)
Need Information on how to create bytecode ojh16@student.canterbury.ac.nz (Oliver Hunt) (2006-04-09)
Re: Need Information on how to create bytecode haberg@math.su.se (2006-04-09)
Re: Need Information on how to create bytecode Satyam@satyam.com.ar (Satyam) (2006-04-12)
Re: Need Information on how to create bytecode scavadini@ucse.edu.ar (2006-04-12)
Re: Need Information on how to create bytecode amedlock@gmail.com (DavidM) (2006-04-14)
Re: Need Information on how to create bytecode megavlad@gmail.com (megavlad@gmail) (2006-04-17)
Re: Need Information on how to create bytecode ken.overton@gmail.com (kov) (2006-04-21)
[1 later articles]
| List of all articles for this month |

From: Oliver Hunt <ojh16@student.canterbury.ac.nz>
Newsgroups: comp.compilers
Date: 9 Apr 2006 17:23:01 -0400
Organization: Compilers Central
References: 06-04-048
Keywords: interpreter, design
Posted-Date: 09 Apr 2006 17:23:01 EDT

Yeah i had that problem some time ago -- the best you seem to be able
to do is look at the opcode sets for other vm and working out what
subset you want. You also have to decide how your vm should operate,
either stack based or register based. Most VMs seem to be stack based
these days (CLR, JVM, ...) though a few are register based.


As far as your examples go they should end up looking fairly similar
to what they'd look like on a real (non-virtual) machine, eg
$someValue = 1 + 2
would become something along the lines of (on a stack based vm)
lit 1 ; load 1 onto the top of stack (TOS)
lit 2 ; load 2 onto the TOS
add ; pull top two elements off stack add them together and push result
sto someValue ; pop TOS and place into location for someValue


or (on a register vm)
lit r1, 1 ; load 1 into register r1
lit r2, 2 ; load 2 into register r2
add r3, r1, r2 ; r3 <- r1+r2
mov someValue, r3


and your more complex example:
$i = true;
if($i)
          $otherValue = someFunction(arg0, arg1);
else
          $otherValue = "Not true";


would become similar to this (stack based), labels are symbolic you
compiler would need to make them actual addresses
lit 1 ; most VMs i've seen don't actually recognise bool as being
distinct from an int
sto i ; store TOS to i
lod i ; load value of i to TOS
jpz else_label ; pop TOS, if value is 0 jump to else_label
lod arg1; the exact semantics for passing args is up to you
lod arg0;
call someFunction;
sto otherValue; assuming your calling sequence results in return
value being on the stack, but there are other options
; your calling sequence may require you explicitly clearing the args
of the stack
; or your vm may do it as part of its call sequence
jmp end;
else_label: ; start of else block
newstring "Not true" ; VMs like the CLR+JVM have an explicit new
string op as well as newobj, the resultant pointer is placed on TOS
sto otherValue
end:


The register vm version is similar, although function args and return
values are likely to be passed through registers.
A register based vm is much more complicated to implement, but can be
faster than a stack machine, especially on
machines that actually have registers. Most register based
approaches i've seen assume an infinite number of registers
and rely on the JIT to perform register allocation, which is
exceedingly non-trivial.


Given this is likely your first VM i'd strongly recommend that you
use a stack machine, it will make your life decidedly less painful.


As far as data structures go a simple struct a la :
typedef enum {lit, lod, lodarg, sto, jpz, jmp, call, newstring, ... }
optype_t;
typedef struct {
      optype_t opcode;
      union {
          int litvalue; // for lit
          some_type target; // for jmp, jpz, the address will obviously be
different in memory than in file
          int offset; //for lod, and lodarg
          function; //for call: in a file it should be a string, in
memory a pointer to the function info struct
          ...
      } args;
} opcode_t;


That is a fairly simple approach but it would work.


As far as jitting goes, it is tremendously complicated, I recommend
you first see if you can go from whatever your opcode format is to
assembly that you can pass into an assembler, if you can get that
going then you know the native code you are generating is correct, so
you can set about generating binary opcodes -- but that is also
decidedly non- fun


Hope this helps,
      Oliver


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.