Re: Syntax Disambiguation in Functional Language

Chris F Clark <cfc@shell01.TheWorld.com>
Wed, 24 Nov 2010 14:13:40 -0500

          From comp.compilers

Related articles
Re: Syntax Disambiguation in Functional Language cfc@shell01.TheWorld.com (Chris F Clark) (2010-11-24)
| List of all articles for this month |

From: Chris F Clark <cfc@shell01.TheWorld.com>
Newsgroups: comp.lang.functional,comp.compilers
Date: Wed, 24 Nov 2010 14:13:40 -0500
Organization: The World Public Access UNIX, Brookline, MA
References: <icg6ht$q3e$1@news.eternal-september.org> <87zkt0w7fq.fsf@kuiper.lan.informatimago.com> <icgg3f$q3e$2@news.eternal-september.org> <87oc9gw2dk.fsf@kuiper.lan.informatimago.com> <icgogh$q3e$5@news.eternal-september.org> <n7vne6dnevbi5ktl5u6q1guhdf0oranem8@4ax.com>
Keywords: parse
Posted-Date: 26 Nov 2010 23:29:14 EST

George Neuner <gneuner2@comcast.net> writes:


> On Tue, 23 Nov 2010 16:01:21 +0000 (UTC), "Paulo J. Matos"
> <pocmatos@gmail.com> wrote:
>
>>On Tue, 23 Nov 2010 15:17:27 +0100, Pascal J. Bourguignon wrote:
>>
>>> If there is an ambiguity, by definition, you cannot sort it out.
>>>
>>> If you don't want ambiguities, then don't put them in! Ie. design your
>>> grammar so that there's no ambiguity. Anything does, as long as there
>>> is no ambiguity.
>>
>>Excuse me but either you didn't understand my point or you just like to
>>post things that do not answer the question.
>>
>>afaik python and sml allow things that _look like_:
>>
>> f
>> (x)
>>
>>how do they know if this is a function call or two separate expressions?
>
> Sorry, but there isn't any *simple* way to disambiguate without using
> some kind of punctuation. Pascal is simply recommending Lisp's
> method.
>
> A complex way would be to use a GLR parser which forks when
> encountering ambiguity - the parser then follows all possible syntax
> paths in parallel until one emerges as correct (or until all paths end
> in failure).
>
> However, even GLR can get hopelessly confused by a grammar that
> permits composition of ambiguity and GLR parsers in general have
> *TERRIBLE* diagnostics - the fact that the grammar is expected to be
> ambiguous means that almost no parsing errors are possible ...
> virtually all meaningful checking has to be done on the parse trees.
>
> It might be better to take this question over to comp.compilers. I'm
> pretty well versed in parsing technology, but there are some real
> experts over there that might know some tricks you could use.
>
> George


Topic 1:


As has been pointed out in this thread, many of the languages that
disambiguate these lines do so by respecting newlines (and often
indentation). Neither feature is particularly well suited to
currently existing formal grammars. I'm not sure why no one has
bothered to formalize them, except that from a theoretical perspective
they don't add much expressive power (you can write everything you can
write with an indentation oriented language with a parenthesized
language) and from a practical perspective they require "counting"
things and counting is non-regular, so it doesn't axiomatize well.


Thus, while it is easy to say an increase in the indentation level is
equivalent to an open parenthesis (or possibly a set of them) and a
descrease in the indentation is equivalent to a close parenthesis (or
set of them). The practical issues in getting that formalized
precisely right is not as trivial as it sounds and their aren't tools
that automate writing grammars in that style (that I know of).


Topic 2:


George's solution of using GLR parsing and them his pointing out its
shortcomings is essentially spot-on. GLR parsing allows too much
ambiguity, especially in cases like these, because as long as there
are a set of legal parses, the parser will keep going hoping that at
some point they get disambiguated. However, without newline and
indentation style hints, there really isn't anything to disambiguate
them. Thus, the result is simply an undisambiguated parse forest.


An alternative that is a little more practical is the use of PEGs
(parsing expression grammars). In this case, the grammar effectively
defines a preferred method of disambiguating the cases where two
parses are possible. The result is that the parse forest generated by
a GLR type tool is pruned automatically down to one result. The rules
for doing such are generally simple and intuitive (at least if one can
see the grammar). PEGs do at time still produce counter-intuitive
results and the diagnostic tools are still quite limited. If one
tries to be too subtle or too sophisticated, one can quite easily
produce something unintelligible.


To my mind even better is the close sibling predicated grammars. They
share the characteristic that the ambiguity is resolved by parsing
order. However, they limit the ambiguity to where the grammar writer
has specifically allowed it. Thus, one can think of predicated
grammars as being primarily unambiguous with some cwertain limited
ambiguities allowed and disambiguated.


However, as I mentioned, I'm not aware of tools in any of these
grammar classes that have "operators" that comprehend indentation
levels. Without that hook, you will never quite get something that
looks like Python or Haskell from a grammar based tool (unless you
build it by yourself).


Hope this helps,
-Chris


******************************************************************************
Chris Clark email: christopher.f.clark@compiler-resources.com
Compiler Resources, Inc. Web Site: http://world.std.com/~compres
23 Bailey Rd voice: (508) 435-5016
Berlin, MA 01503 USA twitter: @intel_chris


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.