Re: Bytecode Compiler

"Paul Robinson" <Postmaster@paul.washington.dc.us>
24 May 2004 00:29:10 -0400

From comp.compilers

Related articles
Bytecode Compiler chris.cranford@tkdsoftware.com (2004-04-21)
Re: Bytecode Compiler dido@imperium.ph (2004-04-28)
Re: Bytecode Compiler nmh@t3x.org (Nils M Holm) (2004-04-28)
Re: Bytecode Compiler casse@netcourrier.fr (=?ISO-8859-1?Q?Cass=E9_Hugues?=) (2004-04-28)
Re: Bytecode Compiler RLake@oxfam.org.pe (2004-04-28)
Re: Bytecode Compiler lars@bearnip.com (2004-04-29)
Re: Bytecode Compiler pohjalai@cc.helsinki.fi (A Pietu Pohjalainen) (2004-05-02)
*Re: Bytecode Compiler Postmaster@paul.washington.dc.us (Paul Robinson)* (2004-05-24)**

| List of all articles for this month |

From:	"Paul Robinson" <Postmaster@paul.washington.dc.us>
Newsgroups:	comp.compilers
Date:	24 May 2004 00:29:10 -0400
Organization:	Compilers Central
References:	04-04-064
Keywords:	interpreter
Posted-Date:	24 May 2004 00:29:10 EDT

Chris Cranford wrote in message 04-04-064...
>If someone were to ask me to develop bytecode for "2+3", this would easily
>make sense to me that I would:
>
> push 2
> push 3
> add // push(pop(tos)+pop(tos))
>
>But when we begin tossing in the concept of variables and strings, things
>begin to get complicated and hard to follow.

Compiling or translating any program is trivial. But getting
*correct* output, now that's where the devil is in the details!

>Another option would be to assume a maximum number of slots for local
>vars like the java virtual machine does of 255. Is there any PROs to
>this? I would see this as being wasteful, especially for the case
>where only a handful of variables are being used like the above
>example.

If you're coding for a microprocessor in an embedded application, then
you should worry about memory usage. But if it's for a general purpose
computer, worrying about wasting 255 32-bit words on a machine that
conceivably has 32 meg of real memory at a minimum (an older Pentium
class machine running on Linux) and with swap space of 100 meg or
more, this is probably a false economy. (I'm giving an example of a
really low class machine that you can buy used for less than the cost
of a tank of gas these days. I just assisted a friend of mine replace
her current machine, an HP Vectra Pentium II 266mhz with 64 meg of
memory and 2GB of disk space that I bought for her as a Christmas
present for US$39.95 a year or so ago with a brand-new 3.2ghz HP with
160GB of hard disk space and 512MB of memory that cost $730.00 before
rebates. And that's not even the least expensive major brand around,
let alone some of the off-names.)

Worry more about increasing run-time speed.

While no one says you should waste resources with abandon, trying to
save memory may be less efficient than doing something that is maybe a
little bit wasteful of memory to increase performance.

> More memory would have to be allocated for the unused stack slots
>that necessary. It could be the job of the compiler to look at the
>symbol table for each code block and determine the number of slots
>needed for that frame. Then, just use either:

Or just count them as you allocate them in the table, then have the
compiler push the amount of space needed as the first instruction of
the program. Or, if you want to make the code re-entrant, you issue a
request for dynamic memory of that much and use a register to access
it.

I think allocating the amount of memory needed as a constant when the
program starts is better than grabbing it a piece at a time, you do
one allocation request and it's much faster than a lot of small grabs.
Further, decompiling a program or being able to match up source code
to generated code (say for implementing a source-level debugger) is
much simpler if a particular construct always does the same thing.
Fewer uses of exceptions and special rules will make simpler
compilation.

>Lets now assume we have the following source code:
>
> Dim s as String = "Test"

If S is an invariant constant, you create a labeled constant in the
code and simply have the piece of code that uses it be handed it or
copied it. If S is subject to change, you create a labeled constant,
you create S as a local variable, and you copy the constant into S at
run time. If a code segment can be written to at run time you could
conceivably create it as a local variable and initialise it at compile
time.

>I have several options on how I could implement this. I could use a
>special opcode that says, the following byte stream represents a
>string of ASCII characters which is terminated by the first NULL byte
>found. Take this stream, allocate a memory pointer for it and put the
>stream in the memory location. Then store the pointer on the stack.

Why execute data? All that checking and jumping around wastes
processor time. Put the item in the constants, then move the constant
to the target variable as needed. Advantage is if it's a "dead"
variable (allocated but never used) your run-time overhead is zero.

>So the associated opcode would be: [OPCODE 00 04 00 00]
>
>Now, once the memory pointer has been placed on the TOS, there would be
then
>an opcode that tells the VM to store the TOS in the local variable space of
>the stack frame at offset 0.

I'm presuming you can access the local variable items directly such
that it's no harder to access the 53rd entry on the stack as it is to
access top of stack. In which case you allocate enough room for all
the local variables, if you have a string you create it as if it were
any other variable (just larger) and then move any constant data into
it if it has to be predefined.

Or even better: create a complete map of all the local variables in
your routine, those that are initialized get the appropriate constant,
the rest get 0, an invalid pointer or null string. Then do one block
move of the initialization block to the top of the stack. Fills every
uninitialized variable to zero, fills every initialized variable to
the desired value. This cleans up a lot of problems. Ordinary
variables start at zero. Misused pointers will get an exception (or
you can pre-check for this special invalid pointer value and trap
them) and strings contain nothing. Every time your
subroutine/method/function runs, all variables are initialized the
same so even recursive calls work consistently.

As for strings, I'd like to make two suggestions. First, do not use
byte-size strings unless memory is critical. Support double-byte or
multi-byte strings so that you can support Unicode or other
multi-language systems. Making programs internationalization friendly
increases the potential lifespan of that program.

Second, support a length prefix rather than using a trailer byte. In
fact, support both a maximum length and a current length pointer.
This means that if you shrink a string and then expand it, as long as
you don't make it longer than the longest it's ever been, you simply
reuse the same space. This reduces requirements for reallocations at
run-time. Also, it makes you totally agnostic as to content. You
don't have to care if a string contains zeros or look for them; you
simply use the length pointer. It also allows the programmer to
change a string themselves internally if they want. It also allows
you to do run-time (or in some cases, compile time) buffer overflow
checks, which can catch errors before they become security holes in
user programs.

--
Paul Robinson
"Above all else... We shall go on..."
_"...And continue!"_

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Bytecode Compiler

"Paul Robinson" <Postmaster@paul.washington.dc.us>24 May 2004 00:29:10 -0400

"Paul Robinson" <Postmaster@paul.washington.dc.us>
24 May 2004 00:29:10 -0400