RE: LR (k) vs. LALR

Quinn Tyler Jackson <quinn-j@shaw.ca>
8 Sep 2004 12:04:01 -0400

          From comp.compilers

Related articles
Re: LR (k) vs. LALR cfc@shell01.TheWorld.com (Chris F Clark) (2004-09-07)
RE: LR (k) vs. LALR quinn-j@shaw.ca (Quinn Tyler Jackson) (2004-09-08)
| List of all articles for this month |
From: Quinn Tyler Jackson <quinn-j@shaw.ca>
Newsgroups: comp.compilers
Date: 8 Sep 2004 12:04:01 -0400
Organization: Compilers Central
References: 04-09-049
Keywords: parse, LALR
Posted-Date: 08 Sep 2004 12:04:01 EDT

Chris Clark said (in part):


> It is worth mentioning that there are other ways of handling ambiguous
> grammars. In particular, one can use predicates to resolve
> ambiguities. Predicates allow one to take the difference of two
> productions in a controlled manner. In particular, it is possible to
> write a syntactic rules that says, try to parse this as a declaration
> and if it isn't parse it as an expression. The difference between the
> predicated and the GLR solution is that predicated grammars are still
> deterministic. There are no hidden ambiguities in a predicated
> grammar. If your predicated parser generator gives you an error, you
> still have an unresolved ambiguity and if it doesn't the resulting
> parser will always construct a parse tree (and not a forest)


I agree with Chris about the use of predicates.


Interestingly, the use of predicates alone can some Type 1 power to a
grammar.


For instance:


L1 = {a^n b^n c+} // clearly a type 2 language
L2 = {a+ b^n c^n} // clearly a type 2 language


L1 intersect L2 = {a^n b^n c^n} // a type 1 language


Current research in the area of this class of grammars can be found here:


http://www.cs.queensu.ca/home/okhotin/


See the section on "Boolean grammars." Intersection can get quite a bit of
power out of a formalism.


My most recent paper deals with several difficult to parse languages of the
classical sort, including the particularly nasty to parse:


L = {a^m b^n c^mn}


The only grammar I've seen expressed for that one in classical form is in
Type 0 due to a length increasing production:


  (1) <S> ::= <H><S> | <H><B>
  (2) <B> ::= <B><B> | <C>
  (3) <H><B> ::= <A><X><N><B>
  (4) <N><B> ::= <B><N>
  (5) <B><M> ::= <M><B>
  (6) <N><C> ::= <M>c
  (7) <N>c ::= <M>cc
  (8) <X><M><B><B> ::= <B><X><N><B>
  (9) <X><B><M>c ::= <B>c
(10) <H><A> ::= <A><H>
(11) <A> ::= a
(12) <B> ::= b


Because production (9) is length increasing, the grammar is in Type 0 form,
even though the language itself is Type 1. I'd like to see that grammar
normalized to a Type 1 -- but haven't been able to find one.


The longest derivation I've been able to do with that by hand is aabcc,
which is:


(11) aabcc --> <A>abcc
(11) <A>abcc --> <A><A>bcc
(12) <A><A>bcc --> <A><A><B>cc
  (9) <A><A><B>cc --> <A><A><X><B><M>cc
  (7) <A><A><X><B><M>cc --> <A><A><X><B><N>c
  (4) <A><A><X><B><N>c --> <A><A><X><N><B>c
  (3) <A><A><X><N><B>c --> <A><H><B>c
  (9) <A><H><B>c --> <A><H><X><B><M>c
  (6) <A><H><X><B><M>c --> <A><H><X><B><N><C>
  (4) <A><H><X><B><N><C> --> <A><H><X><N><B><B>
  (2) <A><H><X><N><B><B> --> <A><H><X><N><B>
(10) <A><H><X><N><B> --> <H><A><X><N><B>
  (3) <H><A><X><N><B> --> <H><H><B>
  (1) <H><H><B> --> <H><S>
  (1) <H><S> --> <S>


I started on aaabbcccccc -- but got lost in the shuffle. :-(


If anyone wants to try a purely "predicate" approach to the above
language -- I'd love to see it. (Or if anyone would care to post the
derivation of aaabbcccccc, I'd really love to see that, too.)


The $-grammar I wrote for that one accepts strings in O(n^2.3), and has 6
productions and 2 of those are predicates, but also makes use of 2
phi-expressions, which implies a total of 4 predicates (since there is an
implied predicate with every phi-expression) and at least 2 name-indexed
tries.


Also of note is that I allow for a-expressions in predicates, which allows
for substrings that have been parsed to be concatenated to form entirely new
input that is then passed to the predicates. (The $-grammar for a^m b^n c^mn
uses this.) This is similar to what is known as "length-increasing".


Anyway, a few nights ago, I was asking myself about this formation in C++:


class Foo
{
int inline_function(int x)
{
return __y * x; // __y is used before being seen
}


int __y;
}; // this class is legal


class Bar
{
int inline_function(int x)
{
return __y * x; // __y is an undeclared variable
}
}; // this class is not legal because __y never gets declared


$-calculus was able to handle this ... without any code, accepting Foo and
rejecting Bar -- using all of the above mentioned techniques.


Although the resulting $-grammar uses more than 1 explicit predicate, it
does make use of phi-expressions, and these have implied predicates -- so I
was not able to do it without extensive overall use of predication.


Anyway -- I wrote it up and am looking for somewhere that is looking for a
3.5 page paper on such things. Any thoughts?


[BTW -- Chris -- direct email to you bounces from my account. Is that a spam
guard?]


--
Quinn Tyler Jackson


http://members.shaw.ca/qjackson/


"Never express yourself more clearly than you think."
-- Niels Bohr


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.