Re: Top-down deterministic parser?

Alexander Morou <alexander.morou@gmail.com>
Mon, 13 Jul 2009 06:13:36 -0700 (PDT)

          From comp.compilers

Related articles
Top-down deterministic parser? alexander.morou@gmail.com (Alexander Morou) (2009-05-11)
Re: Top-down deterministic parser? alexander.morou@gmail.com (Alexander Morou) (2009-07-11)
RE: Top-down deterministic parser? quinn_jackson2004@yahoo.ca (Quinn Tyler Jackson) (2009-07-12)
Re: Top-down deterministic parser? alexander.morou@gmail.com (Alexander Morou) (2009-07-13)
| List of all articles for this month |

From: Alexander Morou <alexander.morou@gmail.com>
Newsgroups: comp.compilers
Date: Mon, 13 Jul 2009 06:13:36 -0700 (PDT)
Organization: Compilers Central
References: 09-07-016 09-07-019
Keywords: LL(1)
Posted-Date: 14 Jul 2009 09:34:11 EDT

On Jul 12, 4:12 pm, "Quinn Tyler Jackson" <quinn_jackson2...@yahoo.ca>
wrote:
> > [. . .]
>
> Because Meta-S parsing is adaptive, grammars are entirely compiled
> into the parsing logic at parse-time. (Or rather, at both grammar
> link-time and parse-time, where link-time occurs before a parse, but
> is at run-time).
>
> Those productions which are not adaptive can be entirely linked before
> a parse, and their look-aheads are determined at this time, and this
> percolates through the entire grammar. Explicit use of the #relink
> directive in a grammar can also cause look-ahead calculation to be
> reprocessed in the middle of a parse (for instance, if the grammar
> adapts itself based on the input it has encountered, it may be
> desirable for all of the look-aheads to be recalculated).
>
> This is entirely implemented without "spaghetti". Meta-S is not a
> table-driven parser, nor is it a code-generator. The A-BNF compiler
> compiles A-BNF at run-time into an intermediate parsing language
> called LPM*, and after a linking and optimization pass, LPM is
> interpreted at run-time by the parsing engine. Each LPM rule consists
> of one or more "clauses" (what a virtual machine might call a "byte
> code") and each clause can have a "hint" (look-ahead thunk). Because
> of this generalization, there is no pasta involved -- each clause can
> call its look-ahead thunk and change its behavior accordingly when
> invoked by the parsing engine.
>
> By "percolation" I mean that look-aheads can be coalesced during the
> linking process. If rule A can be a B or C, rule A is not entered if
> neither B or C could possibly match.
>
> Because the system is Turing Powerful, certain constructs cannot
> generate look-ahead thunks, and in these cases the engine behaves in
> the usual unlimited-look-ahead fashion. But because this behavior is
> constrained to known-cases (and because of other optimizations too
> numerous to mention here), the entire system often behaves in a
> uni-directional linear fashion. In fact, due to some particular
> optimizations, certain non-trivial cases can perform in a sublinear
> fashion. (Doubling the input less than doubles the processing time.)
>
> Although much of this is described in one way or another in _Adapting
> to Babel_, some of the optimizations are new, undocumented, and/or
> trade secrets, but that's a general summary of one approach to the
> problem you have mentioned.
>
> --
> Quinn Tyler Jackson
> Chief Scientist,
> Thothic Technology Partners, LLC
>
> *LPM = Language for Pattern Matching


Sounds like a more powerful construct than I'm attempting to build;
however, I originally started the project knowing little about the
details of generating a parser. My main concern at the time was
understanding the basics and hoping to create a program which could be
used to build a second version of the program with similar goals but a
touch more power and sophistication (the version I'm building now has
completely clean syntax, because it can't specify code on what to do
in certain matches, which leads to the end-parser being that much more
irksome to build, since there's less data to make any determination
with).


I'm self-taught in this area, so forgive me if my understanding is
below what it should be, but from what I can tell you mean adaptive
parsing by cases where the actual contextual-awareness is altered
based upon the original source grammar and special adaptive features
built into your specific parser generator implementation. An example
where the context would mutate would be a conditional logic applied to
the following aspects of a given production, eg. (A | B | C)* with the
restriction (not shown) stated in the grammar that while they're
repeated, each can occur only once per parse of that part of the
production, thus the logic of the parser would adapt to do so. The
said rule could be expanded at link-time; however in larger sets the
permutations possible would be exponentially expensive to generate so
lazily determining such a thing would be appropriate, the look-ahead
after each production being accepted in the alternation would change
since an item would be eliminated after it's encountered.


Granted I'm only able to make so much of a guess without understanding
the full scope of a type-0 parser, I couldn't find much information
about them other than they're complex.


--


The system I use for the first state is fairly straight forward, a
first set state is seeded with a series of original declaration site
states (the original rule definitions which are deterministic to
start) when the transitions of a first set state are accessed, it
takes a look at the states that seeded the first set state and
generates a table of all of the possible transitions of the current
set.


Then it goes a step further and branches into the rules referenced as
transitions in order to build a proper, full, look-ahead, while it's
building this data it also tracks what rules it explores, and in what
order, the process is recursive so it can obtain a full map. Left
recursion is observed and noted, but it's not necessary from the start
point to re-issue a rule's FirstSet look-ahead information, since it
won't change; however, the path of the redundant reference is noted.


The next step involves using this 'what path was followed' information
to track each individual first set state, when one of the source
states enters a terminal edge, it'll need to figure out which rule it
was from, peek at the path, and inject the look-ahead information,
after where the rule was referenced. It'll have to repeat this
process until all final states are considered, since, in injecting the
post-rule information, the rule that referenced the terminal rule
might also be at its end.


I haven't thought of a good caching mechanism yet, so I'm not quite
there yet. I won't even know what to do in cases of ambiguities until
I get there. I already know though that it will be a royal pain.


Quinn Jackson, any suggestions for this hobbyist related to the above?
Bearing in mind it's not an adaptive parser, since it was beyond my
awareness at the project's conception, and as such the resultant
description language is devoid of any special constraint logic beyond
Kleene operators.


One thing I noticed from just the FirstSet state was there's a lot
more sources for terms than what I initially expected. Finding 13
different sources for some simple keyword, seven of which were from
different paths to the same rule, makes me realize that, when I
started, I was a bit naive.



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.