ultra fast but too complex ... appendum

hoesel@igc.ethz.ch (Frans van Hoesel)
Wed, 24 Mar 1993 13:39:59 GMT

          From comp.compilers

Related articles
ultra fast but too complex interpreter. hoesel@igc.ethz.ch (1993-03-17)
ultra fast but too complex ... summary hoesel@igc.ethz.ch (1993-03-22)
ultra fast but too complex ... appendum hoesel@igc.ethz.ch (1993-03-24)
| List of all articles for this month |

Newsgroups: comp.compilers
From: hoesel@igc.ethz.ch (Frans van Hoesel)
Keywords: interpreter
Organization: University of Groningen, the Netherlands
References: 93-03-060 93-03-080
Date: Wed, 24 Mar 1993 13:39:59 GMT

This I just received from Wayne Throop throopw%sheol@concert.net He didn't
post it directlty to comp.compilers, but suggested that I would do so.
Because I got many me-too requests I'm sending the complete article. It
seems a clean description of "tree threaded" interpretation. My first
impression is... clean too program (allthough the author thinks different
about this) but slow.... too slow perhaps (but this disadvantage may be no
problem for you if you want you interpreter quickly.

- frans

from Wayne Throop throopw%sheol@concert.net:

Perhaps it was just that it was too obvious, but I didn't see anybody
mention the "tree threaded" approach, which while not the fastest, both
simplifies the compilation phase (because you don't have to translate from
a tree to a linear istream), and the interpreter phase (because each
little bit of the interpreter interacts with the others in
well-modularized ways).

As to speed, it isn't *too* shabby, and depending on how fast it needs to
be, it might well do. I have found the scheme to be fast enough for my

The basic notion is that each node in the parse tree contains a pointer to
a function that knows how to produce the results for that node, along with
pointers to the descendant nodes. If we assume a simple type scheme, or
that type checking is done in the compile phase, this leads to quite clean
code, and it's all ANSI compatible, and hence quite portable. (Well, the
way I usually code it does depend on the overlay of a generic node on all
the node types, but that is (arguably) widely portable... but could
probably be improved upon somewhat).

For example (leaving out the parse phase) consider this interpreter for
(very) simple floating point expressions:

#include <stdio.h>

/* 2 node types (plus a generic node) */
typedef struct generic_node_s {
        double (*f)(struct generic_node_s*);
} generic_node_t, *N;
typedef struct value_node_s {
        double (*f)(generic_node_t*);
        double value;
} value_node_t;
typedef struct operation_node_s {
        double (*f)(generic_node_t*);
        generic_node_t *left, *right;
} operation_node_t;

/* five basic operations */
double value(generic_node_t*node){
        value_node_t* p = (value_node_t*)node;
        return p->value;
double add(generic_node_t*node){
        operation_node_t* p = (operation_node_t*)node;
        return p->left->f(p->left) + p->right->f(p->right);
double sub(generic_node_t*node){
        operation_node_t* p = (operation_node_t*)node;
        return p->left->f(p->left) - p->right->f(p->right);
double mul(generic_node_t*node){
        operation_node_t* p = (operation_node_t*)node;
        return p->left->f(p->left) * p->right->f(p->right);
double div(generic_node_t*node){
        operation_node_t* p = (operation_node_t*)node;
        return p->left->f(p->left) / p->right->f(p->right);

/* construct a tree to compute area of a circle */
value_node_t r = {value,23.45};
value_node_t pi = {value,3.14159};
operation_node_t rsquared = {mul,(N)&r,(N)&r};
operation_node_t area = {mul,(N)&pi,(N)&rsquared};

/* try it out */
int main(void){
        printf( "interpreted: %.10g compiled: %.10g\n",
                        area.f((N)&area), r.value * r.value * pi.value );
        return 0;

Doing a quickie timing of the above code, I get these results:

        100000 empty loop iterations: 121
        100000 evaluations interpreted: 922 801 (unoptimized)
        100000 evaluations compiled: 164 43

        100000 empty loop iterations: 78
        100000 evaluations interpreted: 508 430 (optimized)
        100000 evaluations compiled: 137 59

That is, the interpreter is about 7 times slower than code compiled by the
C compiler for this simple expression. The generated code is much uglier
than that anton@mips.complang.tuwien.ac.at (Anton Ertl) presented for a
direct threaded interpreter. Perhaps 3 times as ugly.

These timings shouldn't be taken *too* seriously, (after all, they say the
compiled expression actually slowed down under optimization) but they may
be at least suggestive of the general performance levels involved. They
were done on a DECstation 3100.

Adding multiple interconvertable types and such can be done, either at
run-time (at the cost of slowing down the evaluation functions with
typechecking code) or during the parse (by having the parser insert
explicit conversion tree nodes where needed), or some combination. This
means some more problems defining the generic node type... there are
several ways to deal with it. But exactly what's right for your
application will depend heavily on the details, so I'll leave it at that.
frans van hoesel hoesel@igc.ethz.ch

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.