Related articles |
---|
Re: Top-down deterministic parser? [09-05-060] alexander.morou@gmail.com (Alexander Morou) (2009-07-06) |
Re: Top-down deterministic parser? [09-05-060] rsilvetz@gmail.com (Curator1) (2009-07-08) |
Re: Top-down deterministic parser? alexander.morou@gmail.com (Alexander Morou) (2009-07-11) |
From: | Alexander Morou <alexander.morou@gmail.com> |
Newsgroups: | comp.compilers |
Date: | Mon, 6 Jul 2009 09:15:11 -0500 |
Organization: | Compilers Central |
Keywords: | parse, LL(1) |
Posted-Date: | 06 Jul 2009 17:22:19 EDT |
Hello,
I figured since the last poorly defined question garnered a null
response, I'd take my time to further understand the problem so I
could ask it in a more appropriate manner. I'm curious if anyone
here can point me in the right direction to information about a
top-down, push down automata, parser (PDA LL(k)). From what I can
tell so far, the framework I'm constructing would probably akin to
this, but I lack the experience to finish it as it stands, so I
figured I'd ask people with a bit more experience.
So far the program, as mentioned in earlier posts, understands
regular expressions sufficiently enough, for a bootstrapper, and
my current task is building a framework for defining the rules
and the accompanying parser. To that end, so far the program emits
two fairly sizable enumeration sets related to the identifiers of
rules, and the transitions possible in the language (it's alphabet,
perhaps?). Included in this alphabet is references to the rules
themselves: so when it declares a rule, its basic DFA-state setup
includes the rules.
Here is a small example from some of the code it emits:
//Start code, please forgive odd carriage returns.
//the code generator I wrote adds a return after every
//argument in a method call, except for the last.
internal static DeclarationSiteRule RuleLINQBodyClause
{
get
{
if ((ruleLINQBodyClause == null))
{
DeclarationSiteRule rule = new DeclarationSiteRule(false,
RuleIdentifiers.Rules2.LINQBodyClause);
DeclarationSiteState State1 = new DeclarationSiteState(true,
rule);
DeclarationSiteState State2 = new DeclarationSiteState(true,
rule);
DeclarationSiteState State3 = new DeclarationSiteState(true,
rule);
DeclarationSiteState State4 = new DeclarationSiteState(true,
rule);
DeclarationSiteState State5 = new DeclarationSiteState(true,
rule);
// Transitions for state 0
// Transition: LINQFromClause
rule.Transitions.Add(RuleTransitions.Rules2.LINQFromClause,
State1);
// Transition: LINQLetClause
rule.Transitions.Add(RuleTransitions.Rules3.LINQLetClause,
State2);
// Transition: LINQWhereClause
rule.Transitions.Add(RuleTransitions.Rules3.LINQWhereClause,
State3);
// Transition: LINQJoinClause
rule.Transitions.Add(RuleTransitions.Rules2.LINQJoinClause,
State4);
// Transition: LINQOrderByClause
rule.Transitions.Add(RuleTransitions.Rules3.LINQOrderByClause,
State5);
// Rule edge setup.
rule.SetEdges(new DeclarationSiteState[]
{
State1,
State2,
State3,
State4,
State5
});
ruleLINQBodyClause = rule;
}
return ruleLINQBodyClause;
}
}
//End code
The way I'm intending to break down the data model is in three phases:
1. Declaration Site, which denotes the default rule declaration.
2. Bridge Site, which unifies a series of declaration site
states, this will be used by the third phase.
3. Use site, which contains a single state for every permutation of
any given state in the language. The idea here is the use-site
would delay creating information about its transition targets
until a token from the 'bridge site' is encountered (which
effectively means the bridge defines the token context data
used by the scanner).
Once a state->state transition is encountered on a use-site
state, the idea would be to obtain the bridge for that state
by gathering the declaration site information available.
The information includes a union of the declaration site
elements, which use the transition, causing the state
transition.
The reason for the use of this third level is to close edges from
each rule, since the Bridge elements are created using a union
of all the states that made up the state, keyed with the token
context information of what causes the transition, the edges will
not be unique. So the use-site intends to close the edge gap
by introducing the context information for the point following the
use of a rule (effectively defining the follow set).
I'm also aware that within the bridge I'll need to keep in mind the
'paths' necessary for token, so that if I have the following:
A ::= B 'u';
B ::= C 'd';
C ::= 'a';
The parser itself would know that when the token 'a' is encountered,
from rule 'A', it would know that the path to get there is
A->B->C->'a', since this context information is necessary for clearly
defining the rule boundaries.
I'm contacting the folks here today to see if I've missed anything,
while most of what I've described is mostly focused on verifying a
parse, I need to be able to do at least that to take it a step further
and extract the necessary data from it.
Is this even remotely similar to PDA? I have ideas for how to handle
terminal points of a rule, but from what I can tell, the actual
finalization of a given rule will have to be delayed until the end of
the parse. This is due to resolving which elements are responsible for
accepting what tokens. If the optional parts of a tail end of one rule
is locked for the optional follow part of the calling rule, determining
how to distribute the tokens will need to be done at the end considering
the fixed vs. variable locks involved.
Any response to this is appreciated,
Return to the
comp.compilers page.
Search the
comp.compilers archives again.