How to... (was: disambiguating transformations)

markh@csd4.csd.uwm.edu (Mark)
Sun, 26 Sep 1993 11:50:36 GMT

          From comp.compilers

Related articles
disambiguating transformations MATT@waikato.ac.nz (Matt Melchert) (1993-09-22)
Disambiguating an arbitrary CFL (was: disambiguating transformations) markh@csd4.csd.uwm.edu (1993-09-25)
How to... (was: disambiguating transformations) markh@csd4.csd.uwm.edu (1993-09-26)
| List of all articles for this month |

Newsgroups: comp.compilers
From: markh@csd4.csd.uwm.edu (Mark)
Keywords: parse, theory
Organization: Computing Services Division, University of Wisconsin - Milwaukee
References: 93-09-099 93-09-120
Date: Sun, 26 Sep 1993 11:50:36 GMT

Matt Melchert <MATT@waikato.ac.nz> writes:
>I am looking for an algorithm which will take a (possibly) ambiguous context-
>free grammar and perform transformations on it to yield an equivalent
>non-ambiguous context-free grammar. Does such a thing exist?


Yes, sorta. Referring to a conjecture I made a short while back in the
article "Disambiguating an arbitrary CFL":


            Every C_n language has an unambiguous C_m grammar for some m >= n.


(context free languages are C_1 languages, as indicated there), I think
that the way you do this is to generalize the Iterated Subset algorithm
for Finite Automata to C_n languages. As indicated before, every C_n
language will have an Index Grammar, which represents an infinite state
diagram of a particular topological structure, basically, the cross
product of n infinite trees. For example, the diagram presented in the
article, representing a C[1, 1] language, is basically the cross product
of 2 1-ary trees (i.e. a 2-dimensional grid).


The subset algorithm can be applied directly to either, but the looming
question is whether or not there is a guarantee that this process will be
finite and will eventually close off. Experience indicates that it will.
The problem is that you have to mess around with the indices when forming
subsets of states and this may lead to a proliferation of new index
dimensions, which is why the dimension, m, of the deterministic C_m
grammar may be larger than the dimension, n, of the C_n language.


In contrast, if you try to go further and find the equivalence sets of
states, you are guaranteed that there will be at least one language out
there which will break your equivalence-finding algorithm, no matter what
algorithm you use. You're working on infinite state diagrams, and
equivalence testing can lead to intractible infinite regresses.


So ... finding the MINIMAL deterministic C_m grammar for a C_n language is
not an algorithmic process. But then, as they say, so what? Any
deterministic C_m grammar will be good enough.


By the way, if this conjecture is true, then since parsing with a
deterministic C_m grammar is linear, that means Context Free Language
Recognition is O(n).
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.