Problem with ambigous grammar

sr@247ms.com
1 Jan 2007 17:12:09 -0500

          From comp.compilers

Related articles
Problem with ambigous grammar sr@247ms.com (2007-01-01)
Re: Problem with ambigous grammar cfc@shell01.TheWorld.com (Chris F Clark) (2007-01-02)
| List of all articles for this month |

From: sr@247ms.com
Newsgroups: comp.compilers
Date: 1 Jan 2007 17:12:09 -0500
Organization: Compilers Central
Keywords: parse, question
Posted-Date: 01 Jan 2007 17:12:09 EST

hello everybody,


I am quite new to parsing and don't know if this newsgroup is
appropriate for this type of questions, anyway, the question regards a
certain type of ambiguity in a CF grammar and maybe some of you can
still help.


I have a production R -> A | B and for the non-terminals A and B I
have: FIRST(A) intersect FIRST(B) != EMPTY-SET.
A sentence with a terminal in this intersection will now eventually
have 2 parse trees.


example:


R -> keyword | identifier
keyword -> if | then | else | class | ...
identifier -> [a-z]+


here the sentence 'class' might parse correctly using R->keyword or
R->identifier. Obviously what I am are interested in, is to give the
'keyword' production
more precedence over 'identifier'. now, my first thoughts were these:


a) when the rule R->keyword produces a string s, then don't try the
rule R->identifier.


b) Get all parse trees. Afterwards remove the 'bad' ones.


In the case a) we have to make sure that the parser tries A before B
and have to keep this state, which is ugly just for this purpose.
The second idea b) is not even worth to try, my feeling is that it will
be tremendously slow.
So I had to come up with new ideas, but I don't know if there is a
solution for it.


c) rewrite the grammar to give A precedence over B, but how ????


d) replace B with a new non-terminal C such that FIRST(C) =
FIRST(B)-FIRST(A). Question: can such a C be constructed (at all?
sometimes maybe? )


I can immagine that c) and d) are similar. Of course both c and d will
change the grammar above, but that's actually correct.
I stumbled over such a problem when I was reading about a grammar.
There the production rule was (analogously) defined: identifier ->
[a-z]+ that is not a 'keyword'
Of course they solved ambiguity by simply specifying this extra 'that
is not a keyword' in the grammar, but this is definately not BNF
notation to me.


This brings me then to another question: How can we specify a 'XOR'
operation using BNF notation and how can we specify a 'NOT' operation
using BNF notation. If we had a way to specify a XOR operator in a CF
grammar then we could easily solve the problem above. maybe a CF
grammar is not strong enough to specify 'XOR', then maybe in a CS or
PS grammar ?


best regards,
joe
[I think you will find that there is no such thing as a useful
non-trival truly context free language, and that there is a reason
that we separate the lexer from the parser. (If you don't believe me,
try and handle comments in your BNF.) By far the most common way to
handle keywords is to have the lexer recognize them and return them as
tokens that are different from identifiers. This works nicely in
languages where keyword names are reserved, and can usually be kludged
to work via lexical context tricks in languages where they
aren't. -John]



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.