Re: How to implement dynamic typing?

"BGB" <>
Fri, 23 Apr 2010 08:28:07 -0700

          From comp.compilers

Related articles
[21 earlier articles]
Re: How to implement dynamic typing? (bartc) (2010-04-18)
Re: How to implement dynamic typing? (George Neuner) (2010-04-20)
Re: How to implement dynamic typing? (Mike Pall) (2010-04-21)
Re: How to implement dynamic typing? (bartc) (2010-04-21)
Re: How to implement dynamic typing? (George Neuner) (2010-04-22)
Re: How to implement dynamic typing? (George Neuner) (2010-04-23)
Re: How to implement dynamic typing? (BGB) (2010-04-23)
Re: How to implement dynamic typing? (bartc) (2010-04-24)
Re: How to implement dynamic typing? (BGB / cr88192) (2010-04-26)
Re: How to implement dynamic typing? (George Neuner) (2010-05-11)
| List of all articles for this month |

From: "BGB" <>
Newsgroups: comp.compilers
Date: Fri, 23 Apr 2010 08:28:07 -0700
References: 10-04-009 10-04-028 10-04-031 10-04-036 10-04-038 10-04-051 10-04-053 10-04-054
Keywords: types
Posted-Date: 23 Apr 2010 23:21:16 EDT

"bartc" <> wrote
> Mike Pall wrote:
>> The following post shows the assembler code for a dynamic type check
>> (in the context of the interpreter part of LuaJIT):


oddly enough, I tend to write my interpreters in C. most of my ASM
code then is dynamically generated, and far more often it ends up
being used for utility reasons than for performance reasons (there are
many things one can do in ASM that can't be done easily or at all in

usually most of my optimization then is overall algorithmic or structural
optimization, and generally I prefer to avoid micro-optimizing or edge-case

so, I put a lot of weight typically in "fast enough", and for dealing with
performance issues, tend to follow the profiler.

by simple intuition, my codebase would seem like it should be dead-slow, but
in practice it is plenty fast enough...

much like how an x86 interpreter using strings-processing to match and
decode opcodes should also be dead-slow, or at least slower than a 70x
slowdown vs native (all the segmentation and address-translation stuff
considered), but oh well...

> Overall it wasn't particularly fast however. Actual x=x+1 instructions
> are slow. And it's hampered by:
> 1) Having to do type checks for two operands (the compiler isn't
> clever enough to see x=x+1 needs just one check. But then, this
> tends to get written as x+:=1 or just ++x)

oddly enough, my compilers tend to just split apart the += operation:
"x+=1;" is internally compiled as "x=x+1;"

although, IIRC, complex expressions are handled specially in the LHS, where
most of the complex expression is handled, but then 'dup' is used to split
it at the final stage, then it is evaluated into its lvalue and rvalue sides
are evaluated and the value is assigned.

ex: "[3].baz+=3;".
"[3]" would be evaluated and dup'ed ('.' has a different operation
for lvalues and rvalues).

actually, in violation of the C standard, my C compiler doesn't bother to
differentiate '.' and '->', since the same IL opcodes are used for both, and
the upper-end doesn't really care.

> 2) Continual checks for freeing and duplicating memory, as there is no
> GC to offload this stuff to.


some of my script compilers (notably: my earlier BGBScript implementations)
had done this statically at compile-time, but mostly this was to deal with a
horridly slow GC. sadly, this slow (and not particularly reliable) GC is the
main GC for my framework (and, at least, my technology has improved some
since then, for example, that scripting engine had existed before I added
the ability to shove integers and floats directly into pointers, and these
were the main source of garbage...).

other major sources of garbage were cons cells, environment frames, ... so
logic was also in place to manage these. (enviroment-frame loss was mostly
an issue due to the possibility of continuations, and cons-cells due to both
continuations and lexical closures).

nevermind, my float28 format (used on x86, x86-64 uses float48) has poor
accuracy (still better than float24, which was my original "flonum" format).

float24: s.e7.m16
float28: s.e7.m20
float32: s.e8.m23 (IEEE)
float48: s.e11.m36
float64: s.e11.m52 (IEEE)

but, it works. float28 is mostly comparable to float32 (the accuracy loss is
minor, and the reduced exponent range thus far hasn't been much of an
issue). both float24 and float28 used an asymmetric bias (unlike IEEE
formats), but like IEEE formats do use a hidden bit. as-is, the handler code
for the formats doesn't bother with converting Inf or NaN values, but these
can be assumed to exist (with IEEE-like rules).

float28 is typically converted to 32 prior to calculations, but is usually
done in C via math functions (little would stop an ASM version, but thus far
this has not been a performance issue).

float48 is actually just a double with the low-order bits shaved off.

recently I also added the double-based references feature from LuaJIT, and
this is mostly orthogonal with the pointer-based system. I expect it could
have use in interpreters, but I don't expect as much use elsewhere.

the main non-orthogonal point between the systems is that this new system
("dytf") has full double precision, and when converted to the pointer-based
format (which is the main format used in much of the infrastructure), would
be downgraded to flonums. another issue is that I added 48-bit fixints,
which may be boxed if converting to pointers (if they fall outside the
normal current fixint range of 24 bits for x86, where x86-64 uses 48 bits).

another issue is that, for most operations, they either don't have
convinient wrappers, or would be slower (since they need to wrap/unwrap the

in my case, I suspect type-checking is likely to be fairly slow, and I could
consider optimizing this (I noted after stepping though a typecheck in the
debugger, that it is a long and winding process jumping all over the place).

this is partly because my system doesn't exactly use tagged references, nor
necessarily require that pointers either point to the start of objects, or
necessarily be on the GC heap (one can try to use typechecks on pointers
onto the stack or in malloc'ed memory generally without issue). (internally,
the GC looks-up the object in question...).

the partial solution then is the use of special predicate functions in many
cases (rather than the generic type-checker), since these predicates often
involve special-case logic to more quickly do the type-check (such as by
checking if the pointer is within a certain address range, which is the case
for fixints and flonums, and something "similar" is also done for cons
cells, ...).

note, inheritence and interface checking is faster, if one knows they are
dealing with class/instance objects (although, a generic checker would first
verify that it is a class/instance object, and then check for class-based

however, C/I objects are much rarer in my codebase.
a lot of this I guess is why there is a lot of static and signature-based
typing going on as well, since it can be faster than full-on generic

> I've just looked at something called LuaJIT 2.0.0, and it is pretty
> impressive speedwise, also the fact that it seemed to be just two
> smallish files (you get used to huge sprawling downloads).

yeah, my projects tend to be more in the latter camp.

but, then I am left to wonder sometimes: "why the hell are these random
drivers 50MB?".

but, then I remember, my main project is around 5GB, and doesn't manage to
do a whole lot of anything particularly notable...

> Using "luajit -j off" runs it as a normal interpreter, if I understood
> correctly, and slows it down enough to be able to actually measure
> the execution times..
> Anyway Luajit seems a new target to try and beat..


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.