From: | Hans-Peter Diettrich <DrDiettrich1@aol.com> |
Newsgroups: | comp.compilers |
Date: | 25 Jul 2006 00:40:36 -0400 |
Organization: | Compilers Central |
References: | 06-07-024 06-07-027 06-07-035 06-07-046 06-07-050 06-07-055 06-07-059 06-07-068 |
Keywords: | parse |
Posted-Date: | 25 Jul 2006 00:40:36 EDT |
Max Hailperin schrieb:
>> 1. LL parsers cannot handle left recursion, whereas LR parsers cannot
>> handle right recursion.
>>
>> 2. Most languages are ambiguous at the syntax level, so that implicit
>> disambiguation (longest match...) or explicit semantical constraints
>> must be introduced. (see: dangling else...).
>>
>> 3. Only LL(1) recursive descent parsers are readable, that's why no
>> LL(k) parser generators exist, in contrast to LR(k) parser generators.
>>
>> 4. When at least one terminal must be consumed, before a recursive
>> invocation of a rule, no infinite recursion can occur. (every loop will
>> terminate after all terminals in the input stream have been consumed)
>>
>> Any objections, so far?
>
> Yes.
>
> Point 1 is incorrect; LR parsers can handle right recursion in
> addition to left recursion.
Shame on me :-(
> Point 2 is incorrectly stated, at the least. You seem to have meant
> not that the languages are ambiguous, but rather than particular
> grammars that are commonly used for those languages are ambiguous.
As already stated in a parallel answer, a language can allow for
multiple different parse trees of the same (valid) input, what has been
considered as "ambigous" by several contributors. Can you suggest a
better wording?
> Taking the example of the dangling else problem, there are unambiguous
> grammars for this as well as ambiguous ones. The published reference
> grammar for Java, for example, is unambiguous,
Just looked it up, and the Java grammar does not only look horrible to
me, it also can serve as an example of IMO inappropriate techniques in
the design of a language. I just don't know about the order of this
procedure (exponential?), when it had to be applied to disambiguate more
than one production :-(
IMO the maintenance of such a bloated grammar is at least very
inconvenient and error prone.
At least a much simpler Java grammar could have been constructed, by
dropping certain compatibility with the C language syntax in the fist step.
> rather than relying
> upon an extragrammatical disambiguation rule. (Note that such a
> disambiguation rule is not semantic, contrary to what you say, just a
> portion of the syntax specification that may, in some cases, be
> presented extragrammatically.)
I'm just thinking about the implications, built into any kind of grammar
or grammar notation. Do they exist, beyond the syntax of the meta
language? Or are they only introduced by specific parser generators, for
the sake of shorter grammar notation?
> Therefore, it seems that your point 2
> boils down to a claim that "most" of the time the specifiers of
> programming language syntaxes find it more convenient to use an
> ambiguous grammar coupled with extragrammatical disambiguation rules,
> rather thon to use an unambiguous grammar. This is an empirical
> question about the frequency with which different notational
> approaches are used, which I would be hesitant to try to answer just
> based on what I personally have seen. It certainly is not an
> implausible claim, however.
Just as we use high level programming languages, and let an compiler
translate it into detailed assembly code, we prefer to write compact
grammars, and leave it to an parser generator to produce the equivalent
code or automaton. In both cases we should work on the highest (most
compact) level of notation, in order to keep the sources maintainable.
I'm not really happy with macros and similar extensions, when used to
make source code only shorter, at the cost of transparency. OTOH I
really dislike bloated grammars, with a lot of similar productions,
which have been introduced for more or less obvious purposes, as in the
beforementioned Java grammar.
> consider, for example, the following grammar,
> which is LL(2) but not LL(1):
>
> S -> a b S | a c
>
> The recursive descent parser for this can be written quite readably.
Provided that this grammar really is not LL(1), it can be transformed
easily into LL(1) form. IMO this example clearly demonstrates the
advantages of using higher level notations, like EBNF:
S -> a ( b S | c )
instead of:
S -> a S2
S2 -> b S | c
>> Ad (3): This is the only reason for the preference of LR parsers. Why
>> spend time with tweaking a grammar to LR(1)/LL(1), when a parser
>> generator for LR(k) is available?
>
> No, the preference for LR parsers over LL has little to do with point
> 3, not only because point 3 is incorrect, but also because you are
> here addressing the k>1 case, whereas most practical parsers use k=1
> in any case.
Just a question: is your above grammar LR(1)?
I'm somewhat clueless in answering this question myself :-(
> the grammars suitable for LR(1) parsing are a proper superset
> of those suitable for LL(1) parsing.
Thanks for this enlightenment :-)
DoDi
Return to the
comp.compilers page.
Search the
comp.compilers archives again.