Re: Recursive descent parsing and optimization, was Good practical language and OS agnostic text?

BGB <>
Sun, 22 Apr 2012 13:45:44 -0700

          From comp.compilers

Related articles
Good practical language and OS agnostic text? (2012-04-17)
Re: Good practical language and OS agnostic text? (Uli Kusterer) (2012-04-21)
Re: Good practical language and OS agnostic text? (BGB) (2012-04-21)
Re: Recursive descent parsing and optimization, was Good practical lan (BartC) (2012-04-22)
Re: Recursive descent parsing and optimization, was Good practical lan (Dmitry A. Kazakov) (2012-04-22)
Re: Recursive descent parsing and optimization, was Good practical lan (BGB) (2012-04-22)
Re: Recursive descent parsing and optimization, was Good practical lan (Bartc) (2012-04-23)
| List of all articles for this month |

From: BGB <>
Newsgroups: comp.compilers
Date: Sun, 22 Apr 2012 13:45:44 -0700
References: 12-04-019 12-04-056 12-04-060 12-04-066
Keywords: parse, design
Posted-Date: 22 Apr 2012 21:40:28 EDT

On 4/22/2012 4:51 AM, BartC wrote:
> "BGB"<> wrote in message
>> On 4/21/2012 2:22 AM, Uli Kusterer wrote:
>>> - Recursive descent parsers: It's the obvious way to write a parser.
>> although I use recursive descent, the above sounds different from what I
>> usually do.
>> ReadStatement:
>> checks for and handles vaious statement types
>> calls ReadExpression
>> ReadExpression:
>> (actually, this is a tower of functions, one for each precedence
>> level, working from lowest to highest precedence)
> Sounds like it's influenced by the C grammar, which defines expressions
> using something like 13 or 17 layers of precedence.
> Beyond about 3-4 levels, I found that unmanageable. For expression syntax, I
> don't use any precedence in the grammar at all; I have precedence as an
> attribute of an operator, and an expression can be parsed with a single
> function.
> Or rather two: readfactor(priority), and readterm(). Readfactor() deals with
> the binary operators linking successive terms, while readterm() does all
> the real work (since my syntax doesn't distinguish between expressions and
> statements, that's quite a big workload).

I have functions for each precedence level, and there is a lot of them.

luckily, I tend to have the stack of expression precedence level
functions in their own source-file, so it is not all that bad.

>>> - Tokenizing: Essentially grab all the words in your source text and
>>> build an array with an entry for each so you can more quickly walk
>>> forward and backward without having to scan individual characters.
>> partly agreed, except that it isn't really strictly necessary to build
>> an array up-front.
>> most of my parsers don't bother using an array, but instead just call
>> the tokenizer function directly to peek and parse tokens based on the
>> current "stream position" (generally a raw char pointer in my language
>> parsers).
> I've tried reading all the tokens in a separate pass, but didn't really like
> it. And it takes a lot more storage as well, especially with macro
> expansions.
> Instead I read them as I go along, but with provision for a one-symbol
> look-ahead.

my parsers typically look at 1 or 2 tokens at a time, as usually this
seems sufficient for nearly any "ordinary" construction in a C style syntax.

a partial exception here is declarations, which often have to be tried
to be parsed, and will generally fail (revert to a prior point) if the
syntax turns out to not be an expression.

an annoyance (part of how this makes parsing C slower) is that pretty
much every normal identifier as the first token on a statement ends up
being checked if it refers to a typedef, although I guess an
optimization could be to check if the next token is in a "known bad"
list (ex: "=", "->", ...) and then skip out on looking up a typedef in
this case (a known-bad token meaning "this is obviously not a
declaration"). probably not that this would help when parsing header
contents though (as pretty much everything there is a declaration).


yes, ok.

>> some past experiments (special purpose tests), have implied that
>> potentially threaded code can pull off "reasonably solid" interpreter
>> performance (potentially within 3-5x of native).
> Assuming you're talking about dynamic typing, I found it difficult to get
> within 3-5x, unless some sort of type hinting is used, or perhaps you're
> comparing with not too highly optimised native code. Or it's code that is
> memory-intensive, then memory access will dominate.

no, this was assuming that the output was resolved to static types
(although the input HLL is largely dynamically-typed, the goal is to
have largely statically-typed logic going on in the lower levels of the
VM, due either to the use of explicit annotation, and/or, type-inference).

so, these tests assumed that the VM would be able to effectively resolve
the types to their primitive forms (basically, they were intended to try
some things out in advance to help guide the design of a then-planned
new interpreter core, which never got fully implemented).

all this was approx 6 months ago.

the tests involved various numeric calculations (generally using integer
and floating point types), and compared them against their analogues as
plain C code.

there would have been an approx 3-5x slowdown between the interpreter
and native code compiled with the same optimization settings (trying
different settings, although changing the total running times, seemed to
have little impact on the relative execution times).

these weren't really general-purpose tests though, but rather were
involving hand-crafted "instruction sequences" (as lists of function

the "stack" in the case of the VM was represented as a sliding pointer,
and was using fixed 64-bit elements (as was the argument list).

so, a double addition would have looked something like:
*((double *)(ctx->stack+1))=
*((double *)(ctx->stack+0))+
*((double *)(ctx->stack+1));

I ended up not actually using an interpreter structured this way
(instead continuing to use and develop my older interpreter), but
performance was sufficiently good to show that threaded-code was a
viable option and "not drastically slower" than native code.

the result was, me eventually, migrating the older interpreter to the
use of threaded code (for a modest speed-up vs the "big switch()"
strategy I had been using previously).

as-is, the current interpreter performance is closer to around 100x
slower than native, for tests like Fib and bubble-sort (and generally
much smaller if a lot of C-side logic is involved).

for the newer version of the interpreter, with faster/improved "Boxed
Value Type" operations, here is a version of the opcode-handler logic:

BSVM_ThreadOp *BSVM_ThOp_ADD_XD(BSVM_SVMState *ctx, BSVM_ThreadOp *cur)
{ BVT_DefPopUV_Ty(f64, u, v); BVT_AddUV(u, v);
BVT_FreeF64(ctx, v); return(cur->next); }

and if all macros are manually expanded out...

BSVM_ThreadOp *BSVM_ThOp_ADD_XD(BSVM_SVMState *ctx, BSVM_ThreadOp *cur)
f64 *u, *v;
v=(f64 *)(ctx->stack[--ctx->stackpos]);
u=(f64 *)(ctx->stack[ctx->stackpos-1])
*(f64 **)v=ctx->fbvt_double;

where "f64" is a typedef for "double".

as can be seen, this isn't quite as efficient, but this sort of logic
should be a little faster than the current (dynamically-typed) logic.

note that "SVMState *ctx" is the interpreter context, and "BSVM_ThreadOp
*cur" is the current opcode (it contains a function-pointer to the
handler, as well as any operands used by the opcode).

note that, in this interpreter, the stack is an array of pointers.

I don't yet have an estimate for "overall" performance of the new logic,
considering that a good deal more logic is still be noted before I will
be able to actually test and benchmark a lot of this stuff... (but, at
least, with the changes made thus far the VM hasn't started violently
blowing up, which is at least itself a good sign...).

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.