Re: Jit Implementation

"BGB / cr88192" <>
Fri, 26 Mar 2010 09:47:11 -0700

          From comp.compilers

Related articles
[10 earlier articles]
Re: Jit Implementation (Barry Kelly) (2010-03-22)
Re: Jit Implementation (bartc) (2010-03-23)
Re: Jit Implementation (bartc) (2010-03-23)
Re: Jit Implementation (cr88192) (2010-03-23)
Re: Jit Implementation (BGB / cr88192) (2010-03-23)
Re: Jit Implementation (bartc) (2010-03-24)
Re: Jit Implementation (BGB / cr88192) (2010-03-26)
Re: Jit Implementation (bartc) (2010-03-28)
Re: Jit Implementation (BGB / cr88192) (2010-03-28)
| List of all articles for this month |

From: "BGB / cr88192" <>
Newsgroups: comp.compilers
Date: Fri, 26 Mar 2010 09:47:11 -0700
References: 10-03-070 10-03-078 10-03-082 10-03-083
Keywords: code
Posted-Date: 26 Mar 2010 14:21:12 EDT

"bartc" <> wrote
> "BGB / cr88192" <> wrote in message
>> "bartc" <> wrote in message
>>> "BGB / cr88192" <> wrote in message
>>>> "bartc" <> wrote in message
>>> I wanted to avoid assemblers, and all that goes with them,
>> I originally wrote my assembler specifically for the purpose of doing
>> JIT.
>> for example:
>> I added syntax as using an API to emit opcodes was lame;
> You mean using: genmc(hlt_opc) is lame but genmc("hlt") isn't?

well, it is beyond this:
it is unlikely the API for a binary assembler could be made this simple,
since one also has to deal with many other argument configurations, ... and
C lacks function overloading (meaning many different names and suffixes).

additionally, one needs a function call for every instruction.

passing a string though, one can batch a whole bunch of instructions
together if needed, and as well use printf-style formatting to include
literal values, ...

so, it was my conclusion (from trying both strategies) that textual ASM is
much more convinient.

as well, it is also much more generic, given I can have many different
subsystems use the same assembler.
also, given the fairly small number of API calls, it is much easier to route
the API from place to place (such as when using COM/JNI-style environment
structs, ...).

> I did also consider textual asm at one time, but now have a record-based
> form (which yet needs to be translated to runnable binary). This form is
> also easier to analyse, if that is needed.


a record based style would be easier to perform analysis on, but it doesn't
come free:
it brings with it many of the issues of using shared structures.

in many cases, I prefer to keep the use of shared structures low (and
instead perform operations via abstract handles and API calls).

>> I added basic linking as simply emitting code and data to sequential
>> memory addresses was limiting;
> I dealt with some of the problems years ago to do with dynamic linking of
> bytecode files, and hopefully some of that experience will work here.

yep, my linker also deals with this, and other issues.
the frontend using the assembler, however, does not.

> I have some unusual requirements (for example fixing up type-ids across
> modules: converting local type-ids to global ones, and doing this at
> runtime
> if necessary) that a normal linker won't help with (afaik).

I forced most things to using a C-style linking model, but did make a few
tweaks, most notably that special symbols may be used to implement many
less-trivial operations (special commands may be embedded into symbols).

examples of this are things like providing an ASM-level interface to my OO
facilities, ...

>> this allowed some of my first JIT experiments: in late 2006 / early
>> 2007, I had success JIT'ing a script language of mine, and at the
>> time was pulling off performance loosely comprable to that of
>> GCC-compiled C code.
> My interpreted project performed well within a magnitude of optimised gcc.
> Not bad, but still not good enough to write itself in, hence I'm now using
> proper compilation.

it was my early success with JIT which prompted me to look into writing a
full on C compiler.
FWIW, I had severely underestimated the effort and complexity of this jump.

I realize now that the language I was developing at the time (BGBScript/1),
was actually fairly close to ActionScript 2.0 as far as syntax and semantics
went (although I had been more influenced by JavaScript).

the languages are not exactly the same in many ways though (there were many
minor differences in the type model and scope behavior, ...).

I am on/off tempted to more solidly revive the thing, maybe or maybe not
moving it in a direction to be more properly like ActionScript (both Java
and the CLR have frustrated me for the time being).

this would differ from my Java and CLR efforts in that I will probably not
try for binary compatibility with AVM or Flash (instead, simply make a
language which is syntactically and behaviorally similar).

it would also be a goal to allow relatively direct interfacing between
BGBScript and C (as-in, ideally, FFI free).

as for my Java frustrations:
FFI-free C<->Java interfacing doesn't much look like a possibility even with
ones' own implementation (Java's design is not good for directly interfacing
with anything not Java...).

to break the FFI wall (in in this case, the JNI / JNA / ... wall) would
actually produce something which may externally resemble Java, but would
essentially be a different technology.

>>> But in the dynamic language I was using, this [assembler] added
>>> less than 300 lines to the compiler, plus various tables.
>> in my case, the assembler + linker is not a particularly large component,
>> granted, it is still a bit larger than 300 loc though.
> (I think I left a lot of the error checking to the loader (so it accepts,
> for example, mov [a],[b] syntax, but won't be able to find an x86 opcode
> for
> it later).)


my assembler goes all the way to machine code.
actually, the parser parses the syntax and generally uses it to directly
drive the original function-call driven API.

>> yeah.
>> many of mine are variants of RPN and abstract stack machines...
> Stack machines are really, really simple to program for. And would solve
> some problems I've having with registers: for example, when you pop a
> complex value, you know it's time to free any associated resources.
> With registers, you don't really know when you've finished with a
> register,
> and can put something else into it. (I know there are lots of complex
> analyses that can be done, but I'm trying to keep things simple and within
> my capabilities.)

yes, I agree.

I still have much better success working with RPN than with the (IME)
terrible confusion of trying to figure out how to work with SSA form.

(possibly, maybe, I am too stupid to understand SSA, I don't really

RPN is easy to figure out how to produce, and for the most part is fairly
easy to figure out how to process.
admitted, it does lead to some fairly hairy ordering issues (my main codegen
being made essentially incompatible with JBC and CIL due to internal
ordering differences).

sadly, getting from RPN to ASM (and dealing with multiple archs, the C
typesystem, ...) is itself a fairly hairy process (much time has gone into
trying to debug and clean up this thing).

it is also fairly slow, meaning that relatively little deals with my IL (it
is far more common for my code to just produce ASM instead).

there are a few things which I would do differently now if I were
implementing it clean, but I am not sure I would want to commit to something
like this.

part of the issue relates some to its original implementation, where it had
fairly thinly wrapped 32-bit x86, and so many "x86'isms" had leaked into the
IL design (many of which are no longer required under the current
implementation, but would be a pain to remove, ...).

>>> However, it has fewer registers: just one main register, plus one or two
>>> auxilliary ones (I've never figured out how to compile for multiple
>>> registers).
>> more than a few registers means using a register allocator.
>> in the trivial case, one is not needed (my early codegens didn't use on,
>> but
>> instead assigned registers to a complicated set of rules and roles).
>> later on, I added a register allocator.
> I really have only about 1.5 registers (in the IL). My data types are 32-,
> 64- and 96-bits, with the latter used for variant and dynamic data. R1 is
> max 96-bits, but R2 is only max 64-bits (and R3 is just the top half of
> R2).
> Of course most of the physical registers do end up being used in one form
> or
> another (in implementing the IL and the runtime for example), it's not as
> though they're just sitting there empty.

yeah, that is what I meant by "assigned rules and roles".

this is in contrast to a register allocator which will try to figure out on
its own which registers will be used where, ...

I had used the assigned-roles strategy originally, mostly with a whole bunch
of flags used to determine what was in the registers and when (such as to
flush registers, reload from saved locations, ...).

later on, I switched to a more generic strategy, where I call a function to
allocate a register, and it figures out the "magic" on its own (the only
major issue being that the number of currently held registers needs to be <=
the number of available arch registers).

typically, the "registers" are managed via abstract handles, rather than
actually knowing the physical register in use. there are also "temporary
variables" (or tvars) which share the same handle-space as registers (and
can also be allocated/freed), ... (however, ordinary variables are generally
referenced by name, and there is also a feature for tvars to be bound to
names, ...).

actually, in many contexts tvars can be used the same as registers, and may
internally map to registers, but differ in a few ways:
a tvar is not necessarily in a register, and an operation on it may require
a register to be allocated (owning a register handle is more "safe" in this
there is no set limit on the number of tvars (as tvars are backed to the
stack, however tvars may end up sharing the same physical memory spots in
many cases).

like registers, they are allocated/freed, and so have a very volatile scope
(and can't be safely held through a jump unless the same tvar is held at
both the source and destination), ... (it is possible though to pre-allocate
a tvar for the target, and then move the present tvar value to the target
before the jump, essentially using it as a makeshift "phi"...).


sadly, my codegen requires 2 passes to get this right (and both passes need
to work in lock-step, since any notable variation would likely cause the
codegen to go berserk...).

another potential weak point is that the optimizer has very little
"forward-looking" capabilities, and so is essentially primarily responsive.

but it does optimize some by delaying things to the "last moment", typically
by "pushing" intermediate results to the stack, which the optimizer may
modify/... as a result of subsequent operations.

many operations may involve a "sync" operation, which essentially "flattens"
the present state (all items on the stack are made to be physically on the
stack, and all variables are in a proper location and state).

there are several different types of sync (synchronizing different things,
...), and they are used for such things as labels/jumps, function calls, ...
(this is needed as some operations could terribly mess things up if the
state were not locked-down to a sane "physical" state).

combined with a few other things, I had found that my codegen can actually
be used as a makeshift interpreter (and is actually turing complete), mostly
as a result of oddities within the optimizer.

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.