Re: Top-down deterministic parser? [09-05-060]

Alexander Morou <alexander.morou@gmail.com>
Mon, 6 Jul 2009 09:15:11 -0500

          From comp.compilers

Related articles
Re: Top-down deterministic parser? [09-05-060] alexander.morou@gmail.com (Alexander Morou) (2009-07-06)
Re: Top-down deterministic parser? [09-05-060] rsilvetz@gmail.com (Curator1) (2009-07-08)
Re: Top-down deterministic parser? alexander.morou@gmail.com (Alexander Morou) (2009-07-11)
| List of all articles for this month |

From: Alexander Morou <alexander.morou@gmail.com>
Newsgroups: comp.compilers
Date: Mon, 6 Jul 2009 09:15:11 -0500
Organization: Compilers Central
Keywords: parse, LL(1)
Posted-Date: 06 Jul 2009 17:22:19 EDT

Hello,


I figured since the last poorly defined question garnered a null
response, I'd take my time to further understand the problem so I
could ask it in a more appropriate manner. I'm curious if anyone
here can point me in the right direction to information about a
top-down, push down automata, parser (PDA LL(k)). From what I can
tell so far, the framework I'm constructing would probably akin to
this, but I lack the experience to finish it as it stands, so I
figured I'd ask people with a bit more experience.


So far the program, as mentioned in earlier posts, understands
regular expressions sufficiently enough, for a bootstrapper, and
my current task is building a framework for defining the rules
and the accompanying parser. To that end, so far the program emits
two fairly sizable enumeration sets related to the identifiers of
rules, and the transitions possible in the language (it's alphabet,
perhaps?). Included in this alphabet is references to the rules
themselves: so when it declares a rule, its basic DFA-state setup
includes the rules.


Here is a small example from some of the code it emits:
//Start code, please forgive odd carriage returns.
//the code generator I wrote adds a return after every
//argument in a method call, except for the last.


internal static DeclarationSiteRule RuleLINQBodyClause
{
        get
        {
                if ((ruleLINQBodyClause == null))
                {
                        DeclarationSiteRule rule = new DeclarationSiteRule(false,
                                RuleIdentifiers.Rules2.LINQBodyClause);
                        DeclarationSiteState State1 = new DeclarationSiteState(true,
                                rule);
                        DeclarationSiteState State2 = new DeclarationSiteState(true,
                                rule);
                        DeclarationSiteState State3 = new DeclarationSiteState(true,
                                rule);
                        DeclarationSiteState State4 = new DeclarationSiteState(true,
                                rule);
                        DeclarationSiteState State5 = new DeclarationSiteState(true,
                                rule);
                        // Transitions for state 0
                        // Transition: LINQFromClause
                        rule.Transitions.Add(RuleTransitions.Rules2.LINQFromClause,
                                State1);
                        // Transition: LINQLetClause
                        rule.Transitions.Add(RuleTransitions.Rules3.LINQLetClause,
                                State2);
                        // Transition: LINQWhereClause
                        rule.Transitions.Add(RuleTransitions.Rules3.LINQWhereClause,
                                State3);
                        // Transition: LINQJoinClause
                        rule.Transitions.Add(RuleTransitions.Rules2.LINQJoinClause,
                                State4);
                        // Transition: LINQOrderByClause
                        rule.Transitions.Add(RuleTransitions.Rules3.LINQOrderByClause,
                                State5);
                        // Rule edge setup.
                        rule.SetEdges(new DeclarationSiteState[]
                                        {
                                                State1,
                                                State2,
                                                State3,
                                                State4,
                                                State5
                                        });
                        ruleLINQBodyClause = rule;
                }
                return ruleLINQBodyClause;
        }
}
//End code


The way I'm intending to break down the data model is in three phases:
1. Declaration Site, which denotes the default rule declaration.
2. Bridge Site, which unifies a series of declaration site
      states, this will be used by the third phase.
3. Use site, which contains a single state for every permutation of
      any given state in the language. The idea here is the use-site
      would delay creating information about its transition targets
      until a token from the 'bridge site' is encountered (which
      effectively means the bridge defines the token context data
      used by the scanner).
      Once a state->state transition is encountered on a use-site
      state, the idea would be to obtain the bridge for that state
      by gathering the declaration site information available.
      The information includes a union of the declaration site
      elements, which use the transition, causing the state
      transition.
      The reason for the use of this third level is to close edges from
      each rule, since the Bridge elements are created using a union
      of all the states that made up the state, keyed with the token
      context information of what causes the transition, the edges will
      not be unique. So the use-site intends to close the edge gap
      by introducing the context information for the point following the
      use of a rule (effectively defining the follow set).


I'm also aware that within the bridge I'll need to keep in mind the
'paths' necessary for token, so that if I have the following:
A ::= B 'u';
B ::= C 'd';
C ::= 'a';


The parser itself would know that when the token 'a' is encountered,
from rule 'A', it would know that the path to get there is
A->B->C->'a', since this context information is necessary for clearly
defining the rule boundaries.


I'm contacting the folks here today to see if I've missed anything,
while most of what I've described is mostly focused on verifying a
parse, I need to be able to do at least that to take it a step further
and extract the necessary data from it.


Is this even remotely similar to PDA? I have ideas for how to handle
terminal points of a rule, but from what I can tell, the actual
finalization of a given rule will have to be delayed until the end of
the parse. This is due to resolving which elements are responsible for
accepting what tokens. If the optional parts of a tail end of one rule
is locked for the optional follow part of the calling rule, determining
how to distribute the tokens will need to be done at the end considering
the fixed vs. variable locks involved.


Any response to this is appreciated,


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.