RE: C++ intermediate representation.

Quinn Tyler Jackson <quinn-j@shaw.ca>
15 May 2005 17:28:35 -0400

          From comp.compilers

Related articles
RE: C++ intermediate representation. quinn-j@shaw.ca (Quinn Tyler Jackson) (2005-05-14)
RE: C++ intermediate representation. quinn-j@shaw.ca (Quinn Tyler Jackson) (2005-05-14)
RE: C++ intermediate representation. quinn-j@shaw.ca (Quinn Tyler Jackson) (2005-05-14)
Re: C++ intermediate representation. mefrill@yandex.ru (2005-05-15)
RE: C++ intermediate representation. quinn-j@shaw.ca (Quinn Tyler Jackson) (2005-05-15)
RE: C++ intermediate representation. quinn-j@shaw.ca (Quinn Tyler Jackson) (2005-05-15)
RE: C++ intermediate representation. quinn-j@shaw.ca (Quinn Tyler Jackson) (2005-05-15)
RE: Language Design Principles, was C++ intermediate representation. rabbit@thehole.com (Tom) (2005-05-16)
RE: Language Design Principles, was C++ intermediate representation. nmm1@cus.cam.ac.uk (2005-05-16)
Re: language design for parsing, was C++ intermediate representation. mefrill@yandex.ru (2005-05-18)
Re: Language Design Principles, was C++ intermediate representation. rbates@southwind.net (Rodney M. Bates) (2005-05-18)
Re: Language Design Principles, was C++ intermediate representation. rbates@southwind.net (Rodney M. Bates) (2005-05-19)
Re: Language Design Principles, was C++ intermediate representation. rweaver@ix.netcom.com (Dick Weaver) (2005-05-19)
| List of all articles for this month |
From: Quinn Tyler Jackson <quinn-j@shaw.ca>
Newsgroups: comp.compilers
Date: 15 May 2005 17:28:35 -0400
Organization: Compilers Central
References: 05-05-114
Keywords: C++, design
Posted-Date: 15 May 2005 17:28:35 EDT

> Stroustrup means using YACC or Bison I think, the programms that
> generate LALR(1) parser and, consequently, demand input grammar must
> be LALR(1). It is known C++ grammar is not LALR(k) for any k. So,
> Stroustrup spent much time trying to write LALR grammar for C++. For
> the moment there is GLR parser as the next generation LALR parsers,
> which makes writing C++ frot-end fast and simple time spending.


If all the world uses a hammer, what about when parsing encounters
screws?


If we settle for context-free engines, how are we to parse
context-sensitive constructions with proven tools?


Take pseudoknots for example:


http://www.cs.brandeis.edu/~cs178/SearlsNature2002.pdf


We can hack context-sensitivity into context-free technologies, but we
can't prove with any degree of ease (in the formal sense) that those
hacks are correct.


One such augmentation is predicates. We know that the intersection of
two context-free languages can be a context-sensitive language, as in
the case of a+ b^n c^n INTERSECT a^n b^n c+, but what do we formally
know about such intersected languages as a family? Okhotin's work in
conjunctive Boolean grammars tells us quite a bit about them, but as
of his 2004 doctoral dissertation (q.v. [Okhotin 2004c]), whether or
not L = {ww | w = {a,b}*} was a conjunctive Boolean language was an
open issue (he conjectures that it isn't -- but doesn't prove it
isn't).


Okhotin's dissertation also left the open question: Do there exist
context-sensitive languages for which there is no conjunctive Boolean
grammar?


If a language as simple as L = {ww | w = {a,b}*} is not known to be
produced by the intersection of two context-free languages, and given
that second (fairly big) open question, then the whole thing becomes
very murky. The moment we extend a formalism with something like
intersection, we come up against those open questions. Parsing
languages like C++ and Perl brings us to extensions (if we don't want
to do it with ad hockery "in the code") in our formalisms that have
been neglected in terms of formal study.


C++ is sometimes faulted for things like constructor/declaration
issues, when until not long ago "simple" pseudoknots (by far simpler
than a programming language!) in linear time were nearly intractable,
and certainly more complex than allowed for in LALR(k) solutions, or
even such solutions with simple extensions like intersection through
predication.


Parsing is not a solved problem. For some newer languages, like Lua,
which fall to 98 production grammars that have no ambiguities, perhaps
it is, but not all programming languages were invented during a time
when the benefit of hindsight's 20/20 nature was available, and there
were real reasons for maintaining compatibility with languages that
didn't have proven grammars. Nor are all languages that came after
those tools adherent to the basic principles many hold dear (Perl
seems to be one of those languages). Even when those language issues
are tackled, surely enough something will come along that will prove
intractable with the tools used to conquer those problems.


Tomorrow's better parsing technologies may not encourage tomorrow's
better programming languages to be more parsable -- they may have the
opposite effect. We may end up with programming languages with *more*
ambiguity and more difficult constructions than today's most complex
languages. As the tools increase in power and that power is proven
safe, what those tools are used for, and what results from that
increasingly sophisticated use may result in programming languages not
unlike the confusing English found in this paragraph. 20 or 30 years
down the road, we may look back on C++ and Perl and say, "Oh, for the
days when programming languages were as easy to write grammars for as
C++ and Perl."


Pure speculation of course. But I hope it's the case, because it is my
opinion that parsing theory and practice hasn't yet even touched the
barest of surfaces of what is technically (or will be technically)
feasible in the field, let alone have it declared a solved problem.


--
Chev. Quinn Tyler Jackson
http://members.shaw.ca/qjackson/


[Okhotin 2004c] Alexander Okhotin, Boolean Grammars: Expressive Power and
Algorithms, Ph.D. dissertation, School of Computing, Queens University,
Kingston, Ontario, Aug. 2004.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.