Re: Language design/VM design

"Floris 'Tamama' van Gog" <floris@vangog.net>
28 Mar 2000 01:03:25 -0500

From comp.compilers

Related articles
Language design/VM design floris@vangog.net (Floris 'Tamama' van Gog) (2000-02-27)
Re: Language design/VM design joachim.durchholz@halstenbach.com.or.de (Joachim Durchholz) (2000-03-06)
Re: Language design/VM design jeremy@jboden.demon.co.uk (Jeremy Boden) (2000-03-06)
Re: Language design/VM design floris@vangog.net (Floris 'Tamama' van Gog) (2000-03-11)
Re: Language design/VM design jeremy@jboden.demon.co.uk (Jeremy Boden) (2000-03-23)
Re: Language design/VM design alanf@ns.net (Alan Fargusson) (2000-03-23)
Re: Language design/VM design joachim.durchholz@halstenbach.com.or.de (Joachim Durchholz) (2000-03-23)
*Re: Language design/VM design floris@vangog.net (Floris 'Tamama' van Gog)* (2000-03-28)**
Re: Language design/VM design stephan@pcrm.win.tue.nl (2000-04-01)

| List of all articles for this month |

From:	"Floris 'Tamama' van Gog" <floris@vangog.net>
Newsgroups:	comp.compilers
Date:	28 Mar 2000 01:03:25 -0500
Organization:	XS4ALL Internet BV
References:	00-02-138 00-03-008 00-03-055 00-03-073 00-03-102
Keywords:	design

Yeas I suppose the question was a bit vague but it was written at a
time I didn't really know what I wanted myself :-) I'll try to clarify
myself a bit, as i have had a lot of time in between the first post
and this one to think about it.

What I am trying to make is a virtual machine that executes bytecode.
This VM can be linked together with another program which then can run
the scripts. (The compiler I am making will be part of the VM, and it
generates bytecode 'on the fly' if a 'binary' has not yet been
created). (this has not changed from my first post ;-)

It's probably similair to VisualBasic Script, just that the language
is very much like C/C++, and the interfacing from script to host is
direct. This means that the VM can access host-variables (if made
'public' to the VM) directly. It will get clearer once you read the
below part about pointers.

Now it would be kinda bad if the VM would execute a script that then
crashes the host-program. For this I need this pointer safety (static
arrays are bound-checked at compile time if possible). But while
thinking I came up with the idea not to check if the access is within
VM space, but whether or not that access is within it's array it's
supposed to be in as well.

for example:

char foo[96];
handle bar;
int i=96;

foo[i]=0; /* changes bar */

This is something I want to stop from happening, since I intend to store
real pointers (non-VM offsets) in VM space, for faster/easier memory
management. With these pointers in VM space, all sorts of neat stuff
happens (if you access all memory via a pointer):

0) With possible debug-information added to the binary, one could be
able to check and see what array was overrun.
1) stack and global memory addressing are the same, using a different
base register.
2) VM memory-addressing and host-addressing are the same.
3) exchanging data between host/VM will be very simple (you just pass a
pointer).
4) The VM memory-addressing instructions set gets reduced a lot.

Right now I made all memory accesses be in the form:

[base_register + offset_register + constant_offset]

NOTE: registers are virtual, not real-machine registers)

Here base_register contains either the global VM address space pointer,
the stack pointer, or another pointer which can be either pointing to
the VM data, or the host-data.

With this 'open design' from the VM point of view (it can access
anything) memory-access safety becomes a must, or otherwise a script
could just do something like:

char *foo=function_returning_pointer_to_host_program();
foo+=9000000;
foo[0]=0; /* probably crash */

Since the compiler knows what memory belongs to what type, and the size
of that type, i was thinking of adding extra instructions for bounds
checking. These instructions would only be inserted if it was not known
at compile time whether or not it would be allowed to access that
memory.

check [reg1+reg3+90],600 ;; it is an array of 600 that it is
;; trying to access, see if 0<=(reg3+90)<600
movb [reg1+reg3+90],0

Since the language does NOT allow type-casts (other than (int) etc which
do not really cast, but convert), and the compiler will be expected to
generate correct code, this probably would be the easiest way to
implement bound-checking.

The only way to do these 'check's on pointer memory accessing would be
if the pointer itself would have the length of the data it points to.
Here we come back to the 12(+) byte pointer, which contains a pointer, a
length, and an offset for pointer-arithmic.

Reading back into my first post, I can see I already do not need the
'extern' keyword, other than specifying it to let the compiler know a
GLOBAL host variable exists somewhere, and the 'externs' are no longer
handles, but 'real' (12 byte orso) pointers.

> You could recheck pointer accesses against the allocation data. I.e. ask
> the memory management library whether 'target' is a valid pointer to a
> memory block.
> If you wish to be really secure (depending on your time/security
> tradeoff), you can let the memory manager issue time stamps and store
> these in the 'pointer' structure. Or you drop the 'target' and ask the
> memory manager what address belongs to each time stamp (this essentially
> makes the time stamps into memory handles).

I don't really follow your time-stamps. What would it do?

Thanks for all that replied sofar, each post has given me many ideas
(directly or indirectly)

Floris.

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Language design/VM design

"Floris 'Tamama' van Gog" <floris@vangog.net>28 Mar 2000 01:03:25 -0500

"Floris 'Tamama' van Gog" <floris@vangog.net>
28 Mar 2000 01:03:25 -0500