RE: Writing a parser for C++ - Ada hybrid

"Florent LAGAYE" <flagaye@little-worlds.com>
Mon, 7 Jul 2008 14:16:22 +0200

          From comp.compilers

Related articles
Writing a parser for C++ - Ada hybrid flagaye@little-worlds.com (Florent LAGAYE) (2008-07-03)
RE: Writing a parser for C++ - Ada hybrid flagaye@little-worlds.com (Florent LAGAYE) (2008-07-07)
| List of all articles for this month |
From: "Florent LAGAYE" <flagaye@little-worlds.com>
Newsgroups: comp.compilers
Date: Mon, 7 Jul 2008 14:16:22 +0200
Organization: Compilers Central
References: 08-07-008 <20080705205637.A95BFAF86CB@linserveur.little-worlds.com>
Keywords: parse
Posted-Date: 12 Jul 2008 23:53:09 EDT

Thanks, I chose the the first solution, replacing all "locator" occurences
by the "expression" non-terminal.


I had a bit of refactoring to do on my tree walker, but it wasn't too heavy,
and now my grammar compiles without errors or warnings.


Keep the good stuff going !


Floof.


-----Message d'origine-----
De : Chris Dodd [mailto:cdodd@acm.org]
Envoyi : samedi 5 juillet 2008 22:57
@ : Florent LAGAYE
Objet : Re: Writing a parser for C++ - Ada hybrid


Lets look closely at your conflict to see what is going on. The conflicts
are all in state 96:


> itat 96
:
> 62 | locator .
:
> 68 | locator BOUND_TOKEN locator .
:
> PLUS_TOKEN riduction par utilisation de la rhgle 62
(expression)
> PLUS_TOKEN [riduction par utilisation de la rhgle 68
(expression)]


I've edited this down severely so we're looking at just one conflicting
token, but all the rest are the same, just with different following
tokens.


What this means is that in a state where the parser has seen "locator
BOUND_TOKEN locator" and the following token needs to have an expression
before it, it can't decide whether to reduce the whole thing to an
expression or just the final locator (which would require later reducing
the expression back to a locator with rule 44 "locator: expression
DOT_TOKEN IDF_TOKEN".


So this is both an ambiguity and a lack of enough lookahead case --
if there isn't a trailing "DOT_TOKEN IDF_TOKEN", reducing rule 62 will
lead to a syntax error, so it would have to be rule 68. However, if
there is such a trailing context, either might be valid.


The 'easy' solution is if reducing rule 68 is correct even in the presence
of such trailing context (which I think it is). In that case, you can
just put rule 68 before rule 62, and the default conflict resolution
will do the right thing.


If you want to actually get rid of the conflict, there are a number
of approaches. One is simplifying the grammar to get rid of grammar
distinctions and instead resolve them with later semantic checks.
This is what is usually done with stongly typed languages, as doing
typechecking in the grammar is very hard. In this case you could move the
distinction between a 'locator' and an 'expression' out of the grammar;
it could easily be handled with normal typechecking. To do this, you just
replace 'locator' with 'expression' everywhere appears in the grammar,
and get rid of the resulting redundant "epression : expression" rule.


The second approach is unfactoring -- replacing nonterminals on the rhs of
rules with the rhs of the rules for those nonterminals (which involves
duplicating rules if there are multiple rules for the nonterminal.)
In this case, for example, you can replace the rule 'expression : locator'
with 'expression : expression DOT_TOKEN IDF_TOKEN | namespace IDF_TOKEN'
(two rules). To get rid of the conflicts, you'll need to keep doing this
until you get rid of all uses of 'locator' in the grammar. Since 'locator'
is non-self-recursive, you can actaullay do that, but it will lead to
a significantly more complex grammar. In cases with recursive rules
and conflicts that require unbounded lookahead to resolve, unfactoring
will never reach a solution. Unfactoring works best when there is no
ambiguity and you just need a little bit of extra lookahaead to resolve
the conflict.




Chris Dodd
cdodd@acm.org


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.