RE: LR (k) vs. LALR

Quinn Tyler Jackson <quinn-j@shaw.ca>
8 Sep 2004 12:04:01 -0400

From comp.compilers

Related articles
Re: LR (k) vs. LALR cfc@shell01.TheWorld.com (Chris F Clark) (2004-09-07)
*RE: LR (k) vs. LALR quinn-j@shaw.ca (Quinn Tyler Jackson)* (2004-09-08)**

| List of all articles for this month |

From:	Quinn Tyler Jackson <quinn-j@shaw.ca>
Newsgroups:	comp.compilers
Date:	8 Sep 2004 12:04:01 -0400
Organization:	Compilers Central
References:	04-09-049
Keywords:	parse, LALR
Posted-Date:	08 Sep 2004 12:04:01 EDT

Chris Clark said (in part):

> It is worth mentioning that there are other ways of handling ambiguous
> grammars. In particular, one can use predicates to resolve
> ambiguities. Predicates allow one to take the difference of two
> productions in a controlled manner. In particular, it is possible to
> write a syntactic rules that says, try to parse this as a declaration
> and if it isn't parse it as an expression. The difference between the
> predicated and the GLR solution is that predicated grammars are still
> deterministic. There are no hidden ambiguities in a predicated
> grammar. If your predicated parser generator gives you an error, you
> still have an unresolved ambiguity and if it doesn't the resulting
> parser will always construct a parse tree (and not a forest)

I agree with Chris about the use of predicates.

Interestingly, the use of predicates alone can some Type 1 power to a
grammar.

For instance:

L1 = {a^n b^n c+} // clearly a type 2 language
L2 = {a+ b^n c^n} // clearly a type 2 language

L1 intersect L2 = {a^n b^n c^n} // a type 1 language

Current research in the area of this class of grammars can be found here:

http://www.cs.queensu.ca/home/okhotin/

See the section on "Boolean grammars." Intersection can get quite a bit of
power out of a formalism.

My most recent paper deals with several difficult to parse languages of the
classical sort, including the particularly nasty to parse:

L = {a^m b^n c^mn}

The only grammar I've seen expressed for that one in classical form is in
Type 0 due to a length increasing production:

 (1) <S> ::= <H><S> | <H>
 (2) ::= | <C>
 (3) <H> ::= <A><X><N>
 (4) <N> ::= <N>
 (5) <M> ::= <M>
 (6) <N><C> ::= <M>c
 (7) <N>c ::= <M>cc
 (8) <X><M> ::= <X><N>
 (9) <X><M>c ::= c
(10) <H><A> ::= <A><H>
(11) <A> ::= a
(12) ::= b

Because production (9) is length increasing, the grammar is in Type 0 form,
even though the language itself is Type 1. I'd like to see that grammar
normalized to a Type 1 -- but haven't been able to find one.

The longest derivation I've been able to do with that by hand is aabcc,
which is:

(11) aabcc --> <A>abcc
(11) <A>abcc --> <A><A>bcc
(12) <A><A>bcc --> <A><A>cc
 (9) <A><A>cc --> <A><A><X><M>cc
 (7) <A><A><X><M>cc --> <A><A><X><N>c
 (4) <A><A><X><N>c --> <A><A><X><N>c
 (3) <A><A><X><N>c --> <A><H>c
 (9) <A><H>c --> <A><H><X><M>c
 (6) <A><H><X><M>c --> <A><H><X><N><C>
 (4) <A><H><X><N><C> --> <A><H><X><N>
 (2) <A><H><X><N> --> <A><H><X><N>
(10) <A><H><X><N> --> <H><A><X><N>
 (3) <H><A><X><N> --> <H><H>
 (1) <H><H> --> <H><S>
 (1) <H><S> --> <S>

I started on aaabbcccccc -- but got lost in the shuffle. :-(

If anyone wants to try a purely "predicate" approach to the above
language -- I'd love to see it. (Or if anyone would care to post the
derivation of aaabbcccccc, I'd really love to see that, too.)

The $-grammar I wrote for that one accepts strings in O(n^2.3), and has 6
productions and 2 of those are predicates, but also makes use of 2
phi-expressions, which implies a total of 4 predicates (since there is an
implied predicate with every phi-expression) and at least 2 name-indexed
tries.

Also of note is that I allow for a-expressions in predicates, which allows
for substrings that have been parsed to be concatenated to form entirely new
input that is then passed to the predicates. (The $-grammar for a^m b^n c^mn
uses this.) This is similar to what is known as "length-increasing".

Anyway, a few nights ago, I was asking myself about this formation in C++:

class Foo
{
int inline_function(int x)
{
return __y * x; // __y is used before being seen
}

int __y;
}; // this class is legal

class Bar
{
int inline_function(int x)
{
return __y * x; // __y is an undeclared variable
}
}; // this class is not legal because __y never gets declared

$-calculus was able to handle this ... without any code, accepting Foo and
rejecting Bar -- using all of the above mentioned techniques.

Although the resulting $-grammar uses more than 1 explicit predicate, it
does make use of phi-expressions, and these have implied predicates -- so I
was not able to do it without extensive overall use of predication.

Anyway -- I wrote it up and am looking for somewhere that is looking for a
3.5 page paper on such things. Any thoughts?

[BTW -- Chris -- direct email to you bounces from my account. Is that a spam
guard?]

--
Quinn Tyler Jackson

http://members.shaw.ca/qjackson/

"Never express yourself more clearly than you think."
-- Niels Bohr

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

RE: LR (k) vs. LALR

Quinn Tyler Jackson <quinn-j@shaw.ca>8 Sep 2004 12:04:01 -0400

Quinn Tyler Jackson <quinn-j@shaw.ca>
8 Sep 2004 12:04:01 -0400