8 Sep 2004 12:04:01 -0400

Related articles |
---|

Re: LR (k) vs. LALR cfc@shell01.TheWorld.com (Chris F Clark) (2004-09-07) |

RE: LR (k) vs. LALR quinn-j@shaw.ca (Quinn Tyler Jackson) (2004-09-08) |

From: | Quinn Tyler Jackson <quinn-j@shaw.ca> |

Newsgroups: | comp.compilers |

Date: | 8 Sep 2004 12:04:01 -0400 |

Organization: | Compilers Central |

References: | 04-09-049 |

Keywords: | parse, LALR |

Posted-Date: | 08 Sep 2004 12:04:01 EDT |

Chris Clark said (in part):

*> It is worth mentioning that there are other ways of handling ambiguous*

*> grammars. In particular, one can use predicates to resolve*

*> ambiguities. Predicates allow one to take the difference of two*

*> productions in a controlled manner. In particular, it is possible to*

*> write a syntactic rules that says, try to parse this as a declaration*

*> and if it isn't parse it as an expression. The difference between the*

*> predicated and the GLR solution is that predicated grammars are still*

*> deterministic. There are no hidden ambiguities in a predicated*

*> grammar. If your predicated parser generator gives you an error, you*

*> still have an unresolved ambiguity and if it doesn't the resulting*

*> parser will always construct a parse tree (and not a forest)*

I agree with Chris about the use of predicates.

Interestingly, the use of predicates alone can some Type 1 power to a

grammar.

For instance:

L1 = {a^n b^n c+} // clearly a type 2 language

L2 = {a+ b^n c^n} // clearly a type 2 language

L1 intersect L2 = {a^n b^n c^n} // a type 1 language

Current research in the area of this class of grammars can be found here:

http://www.cs.queensu.ca/home/okhotin/

See the section on "Boolean grammars." Intersection can get quite a bit of

power out of a formalism.

My most recent paper deals with several difficult to parse languages of the

classical sort, including the particularly nasty to parse:

L = {a^m b^n c^mn}

The only grammar I've seen expressed for that one in classical form is in

Type 0 due to a length increasing production:

(1) <S> ::= <H><S> | <H><B>

(2) <B> ::= <B><B> | <C>

(3) <H><B> ::= <A><X><N><B>

(4) <N><B> ::= <B><N>

(5) <B><M> ::= <M><B>

(6) <N><C> ::= <M>c

(7) <N>c ::= <M>cc

(8) <X><M><B><B> ::= <B><X><N><B>

(9) <X><B><M>c ::= <B>c

(10) <H><A> ::= <A><H>

(11) <A> ::= a

(12) <B> ::= b

Because production (9) is length increasing, the grammar is in Type 0 form,

even though the language itself is Type 1. I'd like to see that grammar

normalized to a Type 1 -- but haven't been able to find one.

The longest derivation I've been able to do with that by hand is aabcc,

which is:

(11) aabcc --> <A>abcc

(11) <A>abcc --> <A><A>bcc

(12) <A><A>bcc --> <A><A><B>cc

(9) <A><A><B>cc --> <A><A><X><B><M>cc

(7) <A><A><X><B><M>cc --> <A><A><X><B><N>c

(4) <A><A><X><B><N>c --> <A><A><X><N><B>c

(3) <A><A><X><N><B>c --> <A><H><B>c

(9) <A><H><B>c --> <A><H><X><B><M>c

(6) <A><H><X><B><M>c --> <A><H><X><B><N><C>

(4) <A><H><X><B><N><C> --> <A><H><X><N><B><B>

(2) <A><H><X><N><B><B> --> <A><H><X><N><B>

(10) <A><H><X><N><B> --> <H><A><X><N><B>

(3) <H><A><X><N><B> --> <H><H><B>

(1) <H><H><B> --> <H><S>

(1) <H><S> --> <S>

I started on aaabbcccccc -- but got lost in the shuffle. :-(

If anyone wants to try a purely "predicate" approach to the above

language -- I'd love to see it. (Or if anyone would care to post the

derivation of aaabbcccccc, I'd really love to see that, too.)

The $-grammar I wrote for that one accepts strings in O(n^2.3), and has 6

productions and 2 of those are predicates, but also makes use of 2

phi-expressions, which implies a total of 4 predicates (since there is an

implied predicate with every phi-expression) and at least 2 name-indexed

tries.

Also of note is that I allow for a-expressions in predicates, which allows

for substrings that have been parsed to be concatenated to form entirely new

input that is then passed to the predicates. (The $-grammar for a^m b^n c^mn

uses this.) This is similar to what is known as "length-increasing".

Anyway, a few nights ago, I was asking myself about this formation in C++:

class Foo

{

int inline_function(int x)

{

return __y * x; // __y is used before being seen

}

int __y;

}; // this class is legal

class Bar

{

int inline_function(int x)

{

return __y * x; // __y is an undeclared variable

}

}; // this class is not legal because __y never gets declared

$-calculus was able to handle this ... without any code, accepting Foo and

rejecting Bar -- using all of the above mentioned techniques.

Although the resulting $-grammar uses more than 1 explicit predicate, it

does make use of phi-expressions, and these have implied predicates -- so I

was not able to do it without extensive overall use of predication.

Anyway -- I wrote it up and am looking for somewhere that is looking for a

3.5 page paper on such things. Any thoughts?

[BTW -- Chris -- direct email to you bounces from my account. Is that a spam

guard?]

--

Quinn Tyler Jackson

http://members.shaw.ca/qjackson/

"Never express yourself more clearly than you think."

-- Niels Bohr

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.