Related articles |
---|
Good practical language and OS agnostic text? compilers@is-not-my.name (2012-04-17) |
Re: Good practical language and OS agnostic text? ulimakesacompiler@googlemail.com (Uli Kusterer) (2012-04-21) |
Re: Good practical language and OS agnostic text? cr88192@hotmail.com (BGB) (2012-04-21) |
Re: Recursive descent parsing and optimization, was Good practical lan bc@freeuk.com (BartC) (2012-04-22) |
Re: Recursive descent parsing and optimization, was Good practical lan mailbox@dmitry-kazakov.de (Dmitry A. Kazakov) (2012-04-22) |
Re: Recursive descent parsing and optimization, was Good practical lan cr88192@hotmail.com (BGB) (2012-04-22) |
Re: Recursive descent parsing and optimization, was Good practical lan bartc@freeuk.com (Bartc) (2012-04-23) |
From: | "BartC" <bc@freeuk.com> |
Newsgroups: | comp.compilers |
Date: | Sun, 22 Apr 2012 12:51:44 +0100 |
Organization: | A noiseless patient Spider |
References: | 12-04-019 12-04-056 12-04-060 |
Keywords: | parse, design |
Posted-Date: | 22 Apr 2012 10:29:33 EDT |
"BGB" <cr88192@hotmail.com> wrote in message
> On 4/21/2012 2:22 AM, Uli Kusterer wrote:
>> - Recursive descent parsers: It's the obvious way to write a parser.
> although I use recursive descent, the above sounds different from what I
> usually do.
> ReadStatement:
> checks for and handles vaious statement types
> calls ReadExpression
>
> ReadExpression:
> (actually, this is a tower of functions, one for each precedence
> level, working from lowest to highest precedence)
Sounds like it's influenced by the C grammar, which defines expressions
using something like 13 or 17 layers of precedence.
Beyond about 3-4 levels, I found that unmanageable. For expression syntax, I
don't use any precedence in the grammar at all; I have precedence as an
attribute of an operator, and an expression can be parsed with a single
function.
Or rather two: readfactor(priority), and readterm(). Readfactor() deals with
the binary operators linking successive terms, while readterm() does all
the real work (since my syntax doesn't distinguish between expressions and
statements, that's quite a big workload).
>> - Tokenizing: Essentially grab all the words in your source text and
>> build an array with an entry for each so you can more quickly walk
>> forward and backward without having to scan individual characters.
> partly agreed, except that it isn't really strictly necessary to build
> an array up-front.
> most of my parsers don't bother using an array, but instead just call
> the tokenizer function directly to peek and parse tokens based on the
> current "stream position" (generally a raw char pointer in my language
> parsers).
I've tried reading all the tokens in a separate pass, but didn't really like
it. And it takes a lot more storage as well, especially with macro
expansions.
Instead I read them as I go along, but with provision for a one-symbol
look-ahead.
>> - Syntax tree: Build a tree structure that represents the parsed program.
>> You
> typically, things like simplifying expressions, propagating constants,
> ... is done in a stage I have typically called "reduction", which
> happens between parsing and prior to producing (bytecode) output.
I use the following passes (which seem to be fairly typical):
Syntax analysis (lexing and parsing)
Name resolution (a recent introduction for me)
Type analysis (static type checks and coercions, constant folding)
Code generation (to intermediate code or to byte-code)
Final pass (from intermediate code to the target code)
Usually invoked one after the other for the entire module, where a
compile-time expressions is needed, then the first three passes have to be
done immediately (and the result had better be a constant value..)
I use the same structure now when generating byte-code (originally such a
compiler was just single-pass). Because such code is usually dynamically
typed, the type analysis pass only needs a nominal amount of work, but still
takes care of a few things (l-values for example).
> some past experiments (special purpose tests), have implied that
> potentially threaded code can pull off "reasonably solid" interpreter
> performance (potentially within 3-5x of native).
Assuming you're talking about dynamic typing, I found it difficult to get
within 3-5x, unless some sort of type hinting is used, or perhaps you're
comparing with not too highly optimised native code. Or it's code that is
memory-intensive, then memory access will dominate.
--
Bartc
Return to the
comp.compilers page.
Search the
comp.compilers archives again.