Related articles |
---|
Re: LR (k) vs. LALR cfc@shell01.TheWorld.com (Chris F Clark) (2004-09-07) |
RE: LR (k) vs. LALR quinn-j@shaw.ca (Quinn Tyler Jackson) (2004-09-08) |
From: | Quinn Tyler Jackson <quinn-j@shaw.ca> |
Newsgroups: | comp.compilers |
Date: | 8 Sep 2004 12:04:01 -0400 |
Organization: | Compilers Central |
References: | 04-09-049 |
Keywords: | parse, LALR |
Posted-Date: | 08 Sep 2004 12:04:01 EDT |
Chris Clark said (in part):
> It is worth mentioning that there are other ways of handling ambiguous
> grammars. In particular, one can use predicates to resolve
> ambiguities. Predicates allow one to take the difference of two
> productions in a controlled manner. In particular, it is possible to
> write a syntactic rules that says, try to parse this as a declaration
> and if it isn't parse it as an expression. The difference between the
> predicated and the GLR solution is that predicated grammars are still
> deterministic. There are no hidden ambiguities in a predicated
> grammar. If your predicated parser generator gives you an error, you
> still have an unresolved ambiguity and if it doesn't the resulting
> parser will always construct a parse tree (and not a forest)
I agree with Chris about the use of predicates.
Interestingly, the use of predicates alone can some Type 1 power to a
grammar.
For instance:
L1 = {a^n b^n c+} // clearly a type 2 language
L2 = {a+ b^n c^n} // clearly a type 2 language
L1 intersect L2 = {a^n b^n c^n} // a type 1 language
Current research in the area of this class of grammars can be found here:
http://www.cs.queensu.ca/home/okhotin/
See the section on "Boolean grammars." Intersection can get quite a bit of
power out of a formalism.
My most recent paper deals with several difficult to parse languages of the
classical sort, including the particularly nasty to parse:
L = {a^m b^n c^mn}
The only grammar I've seen expressed for that one in classical form is in
Type 0 due to a length increasing production:
(1) <S> ::= <H><S> | <H><B>
(2) <B> ::= <B><B> | <C>
(3) <H><B> ::= <A><X><N><B>
(4) <N><B> ::= <B><N>
(5) <B><M> ::= <M><B>
(6) <N><C> ::= <M>c
(7) <N>c ::= <M>cc
(8) <X><M><B><B> ::= <B><X><N><B>
(9) <X><B><M>c ::= <B>c
(10) <H><A> ::= <A><H>
(11) <A> ::= a
(12) <B> ::= b
Because production (9) is length increasing, the grammar is in Type 0 form,
even though the language itself is Type 1. I'd like to see that grammar
normalized to a Type 1 -- but haven't been able to find one.
The longest derivation I've been able to do with that by hand is aabcc,
which is:
(11) aabcc --> <A>abcc
(11) <A>abcc --> <A><A>bcc
(12) <A><A>bcc --> <A><A><B>cc
(9) <A><A><B>cc --> <A><A><X><B><M>cc
(7) <A><A><X><B><M>cc --> <A><A><X><B><N>c
(4) <A><A><X><B><N>c --> <A><A><X><N><B>c
(3) <A><A><X><N><B>c --> <A><H><B>c
(9) <A><H><B>c --> <A><H><X><B><M>c
(6) <A><H><X><B><M>c --> <A><H><X><B><N><C>
(4) <A><H><X><B><N><C> --> <A><H><X><N><B><B>
(2) <A><H><X><N><B><B> --> <A><H><X><N><B>
(10) <A><H><X><N><B> --> <H><A><X><N><B>
(3) <H><A><X><N><B> --> <H><H><B>
(1) <H><H><B> --> <H><S>
(1) <H><S> --> <S>
I started on aaabbcccccc -- but got lost in the shuffle. :-(
If anyone wants to try a purely "predicate" approach to the above
language -- I'd love to see it. (Or if anyone would care to post the
derivation of aaabbcccccc, I'd really love to see that, too.)
The $-grammar I wrote for that one accepts strings in O(n^2.3), and has 6
productions and 2 of those are predicates, but also makes use of 2
phi-expressions, which implies a total of 4 predicates (since there is an
implied predicate with every phi-expression) and at least 2 name-indexed
tries.
Also of note is that I allow for a-expressions in predicates, which allows
for substrings that have been parsed to be concatenated to form entirely new
input that is then passed to the predicates. (The $-grammar for a^m b^n c^mn
uses this.) This is similar to what is known as "length-increasing".
Anyway, a few nights ago, I was asking myself about this formation in C++:
class Foo
{
int inline_function(int x)
{
return __y * x; // __y is used before being seen
}
int __y;
}; // this class is legal
class Bar
{
int inline_function(int x)
{
return __y * x; // __y is an undeclared variable
}
}; // this class is not legal because __y never gets declared
$-calculus was able to handle this ... without any code, accepting Foo and
rejecting Bar -- using all of the above mentioned techniques.
Although the resulting $-grammar uses more than 1 explicit predicate, it
does make use of phi-expressions, and these have implied predicates -- so I
was not able to do it without extensive overall use of predication.
Anyway -- I wrote it up and am looking for somewhere that is looking for a
3.5 page paper on such things. Any thoughts?
[BTW -- Chris -- direct email to you bounces from my account. Is that a spam
guard?]
--
Quinn Tyler Jackson
http://members.shaw.ca/qjackson/
"Never express yourself more clearly than you think."
-- Niels Bohr
Return to the
comp.compilers page.
Search the
comp.compilers archives again.