Related articles |
---|
Writing a parser/lexical analyzer builder. alex@alexandermorou.com (Alexander Morou) (2007-10-26) |
Re: Writing a parser/lexical analyzer builder. DrDiettrich1@aol.com (Hans-Peter Diettrich) (2007-10-27) |
From: | Hans-Peter Diettrich <DrDiettrich1@aol.com> |
Newsgroups: | comp.compilers |
Date: | Sat, 27 Oct 2007 21:18:29 +0200 |
Organization: | Compilers Central |
References: | 07-10-087 |
Keywords: | parse, tools |
Posted-Date: | 28 Oct 2007 16:27:07 EDT |
Alexander Morou wrote:
> The project intends to use recursive descent to handle parses. I
> realize that rules defined as: AddExp ::= AddExp AddOperators
> MulDivExp | MulDivExp will by default, in recursive descent, recurse
> into infinity without precautionary measures. To solve such a crisis
> I decided I would add a '_Continuous' parse case to self-referencing
> First-targets and thus as a pseudo mockup I decided upon:
> http://lhq.rpgsource.net/text/aeTest.txt
In EBNF this rule could look like
AddExp ::= (AddExp AddOperators)? MulDivExp
Now it should be clear that the innermost invocation can return nothing
but a MulDivExp or, in general, one of the non-self-recursive
alternatives. Unrolling the loop will result in something like:
while (MulDivExp(&md))
{
result += md;
if (AddOperators(&ao))
result += ao;
else
break;
}
what here effectivly means:
AddExp ::= MulDivExp (AddOperators MulDivExp)*
This should be equivalent to your attempt, only without two separate
versions of the same rule.
> Where AddExp references another Production Rule, and MulDiv references
> a token. You'll note the differences in their proposed
> implementation, one strictly uses the lookahead from the local
> tokenizer stock, the other uses a parse method to determine the same.
> Now remember the above code is just something I threw together to
> presumably solve the issue, I have not verified it because I want to
> ensure I'm taking the project in the right direction before I code en
> masse.
Special lookahead is not required, at least not for C style expressions.
The only ambiguous case can result from old-style (typename) casts, vs.
(expr), which deserves special handling of typenames anyway. An
appropriate transformation of the grammar will make this case LL(1) as well:
ExprOrCast ::= '(' [ Typename ')' ExprOrCast | Expression ')' ]
> The reason I'm posting is: an online associate of mine said that the
> extensions to an already established norm (EBNF) are superfluous. Is
> there any use in adding templates, allowing tokens to be categorized
> to make writing rules easier, and other changes? I don't want to
> waste my time in creating a way to obfuscate grammar description in a
> way that's unusable. If it is useful can it be cleaned up, or is what
> I have even viable at all?
In a second version of my recursive descent C parser I merged all the
expression handling into something like a state machine, eliminating the
methods related (only) to operator precedence. IMO the expression
handling is worth an extension to (E)BNF, which can help to make a
grammar much more compact.
Templates or macros, when expanded before processing a grammar, will
disallow the (human) parser generator to handle special cases in the
most appropriate and compact way. Once a grammar is bloated, it's hard
to reduce it back into a compact form.
A transformation, into a grammar without direct left recursion, instead
can be implemented in a separate step, before the creation of the
recursive descent parser.
DoDi
Return to the
comp.compilers page.
Search the
comp.compilers archives again.