Re: Writing a parser/lexical analyzer builder.

Hans-Peter Diettrich <DrDiettrich1@aol.com>
Sat, 27 Oct 2007 21:18:29 +0200

From comp.compilers

Related articles
Writing a parser/lexical analyzer builder. alex@alexandermorou.com (Alexander Morou) (2007-10-26)
*Re: Writing a parser/lexical analyzer builder. DrDiettrich1@aol.com (Hans-Peter Diettrich)* (2007-10-27)**

| List of all articles for this month |

From:	Hans-Peter Diettrich <DrDiettrich1@aol.com>
Newsgroups:	comp.compilers
Date:	Sat, 27 Oct 2007 21:18:29 +0200
Organization:	Compilers Central
References:	07-10-087
Keywords:	parse, tools
Posted-Date:	28 Oct 2007 16:27:07 EDT

Alexander Morou wrote:

> The project intends to use recursive descent to handle parses. I
> realize that rules defined as: AddExp ::= AddExp AddOperators
> MulDivExp | MulDivExp will by default, in recursive descent, recurse
> into infinity without precautionary measures. To solve such a crisis
> I decided I would add a '_Continuous' parse case to self-referencing
> First-targets and thus as a pseudo mockup I decided upon:
> http://lhq.rpgsource.net/text/aeTest.txt

In EBNF this rule could look like
      AddExp ::= (AddExp AddOperators)? MulDivExp

Now it should be clear that the innermost invocation can return nothing
but a MulDivExp or, in general, one of the non-self-recursive
alternatives. Unrolling the loop will result in something like:

      while (MulDivExp(&md))
      {
          result += md;
          if (AddOperators(&ao))
              result += ao;
          else
              break;
      }

what here effectivly means:
      AddExp ::= MulDivExp (AddOperators MulDivExp)*

This should be equivalent to your attempt, only without two separate
versions of the same rule.

> Where AddExp references another Production Rule, and MulDiv references
> a token. You'll note the differences in their proposed
> implementation, one strictly uses the lookahead from the local
> tokenizer stock, the other uses a parse method to determine the same.
> Now remember the above code is just something I threw together to
> presumably solve the issue, I have not verified it because I want to
> ensure I'm taking the project in the right direction before I code en
> masse.

Special lookahead is not required, at least not for C style expressions.
The only ambiguous case can result from old-style (typename) casts, vs.
(expr), which deserves special handling of typenames anyway. An
appropriate transformation of the grammar will make this case LL(1) as well:
      ExprOrCast ::= '(' [ Typename ')' ExprOrCast | Expression ')' ]

> The reason I'm posting is: an online associate of mine said that the
> extensions to an already established norm (EBNF) are superfluous. Is
> there any use in adding templates, allowing tokens to be categorized
> to make writing rules easier, and other changes? I don't want to
> waste my time in creating a way to obfuscate grammar description in a
> way that's unusable. If it is useful can it be cleaned up, or is what
> I have even viable at all?

In a second version of my recursive descent C parser I merged all the
expression handling into something like a state machine, eliminating the
methods related (only) to operator precedence. IMO the expression
handling is worth an extension to (E)BNF, which can help to make a
grammar much more compact.

Templates or macros, when expanded before processing a grammar, will
disallow the (human) parser generator to handle special cases in the
most appropriate and compact way. Once a grammar is bloated, it's hard
to reduce it back into a compact form.

A transformation, into a grammar without direct left recursion, instead
can be implemented in a separate step, before the creation of the
recursive descent parser.

DoDi

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Writing a parser/lexical analyzer builder.

Hans-Peter Diettrich <DrDiettrich1@aol.com>Sat, 27 Oct 2007 21:18:29 +0200

Hans-Peter Diettrich <DrDiettrich1@aol.com>
Sat, 27 Oct 2007 21:18:29 +0200