|Writing a parser/lexical analyzer builder. email@example.com (Alexander Morou) (2007-10-26)|
|Re: Writing a parser/lexical analyzer builder. DrDiettrich1@aol.com (Hans-Peter Diettrich) (2007-10-27)|
|From:||Hans-Peter Diettrich <DrDiettrich1@aol.com>|
|Date:||Sat, 27 Oct 2007 21:18:29 +0200|
Alexander Morou wrote:
> The project intends to use recursive descent to handle parses. I
> realize that rules defined as: AddExp ::= AddExp AddOperators
> MulDivExp | MulDivExp will by default, in recursive descent, recurse
> into infinity without precautionary measures. To solve such a crisis
> I decided I would add a '_Continuous' parse case to self-referencing
> First-targets and thus as a pseudo mockup I decided upon:
In EBNF this rule could look like
AddExp ::= (AddExp AddOperators)? MulDivExp
Now it should be clear that the innermost invocation can return nothing
but a MulDivExp or, in general, one of the non-self-recursive
alternatives. Unrolling the loop will result in something like:
result += md;
result += ao;
what here effectivly means:
AddExp ::= MulDivExp (AddOperators MulDivExp)*
This should be equivalent to your attempt, only without two separate
versions of the same rule.
> Where AddExp references another Production Rule, and MulDiv references
> a token. You'll note the differences in their proposed
> implementation, one strictly uses the lookahead from the local
> tokenizer stock, the other uses a parse method to determine the same.
> Now remember the above code is just something I threw together to
> presumably solve the issue, I have not verified it because I want to
> ensure I'm taking the project in the right direction before I code en
Special lookahead is not required, at least not for C style expressions.
The only ambiguous case can result from old-style (typename) casts, vs.
(expr), which deserves special handling of typenames anyway. An
appropriate transformation of the grammar will make this case LL(1) as well:
ExprOrCast ::= '(' [ Typename ')' ExprOrCast | Expression ')' ]
> The reason I'm posting is: an online associate of mine said that the
> extensions to an already established norm (EBNF) are superfluous. Is
> there any use in adding templates, allowing tokens to be categorized
> to make writing rules easier, and other changes? I don't want to
> waste my time in creating a way to obfuscate grammar description in a
> way that's unusable. If it is useful can it be cleaned up, or is what
> I have even viable at all?
In a second version of my recursive descent C parser I merged all the
expression handling into something like a state machine, eliminating the
methods related (only) to operator precedence. IMO the expression
handling is worth an extension to (E)BNF, which can help to make a
grammar much more compact.
Templates or macros, when expanded before processing a grammar, will
disallow the (human) parser generator to handle special cases in the
most appropriate and compact way. Once a grammar is bloated, it's hard
to reduce it back into a compact form.
A transformation, into a grammar without direct left recursion, instead
can be implemented in a separate step, before the creation of the
recursive descent parser.
Return to the
Search the comp.compilers archives again.