RE: Writing a compiler compiler

Quinn Tyler Jackson <qtj-query@shaw.ca>
17 Apr 2006 23:42:57 -0400

          From comp.compilers

Related articles
Writing a compiler compiler vladimir.d.lushnikov@gmail.com (Vladimir Lushnikov) (2006-04-17)
RE: Writing a compiler compiler qtj-query@shaw.ca (Quinn Tyler Jackson) (2006-04-17)
| List of all articles for this month |
From: Quinn Tyler Jackson <qtj-query@shaw.ca>
Newsgroups: comp.compilers
Date: 17 Apr 2006 23:42:57 -0400
Organization: Compilers Central
References: 06-04-105
Keywords: parse, C++
Posted-Date: 17 Apr 2006 23:42:57 EDT

Vladamir Lushnikov said:


> My objective is thus - to create a parser generator that will
> eventually generate a parser for a dynamic language. However, my
> undestanding of different parser types and distinctions between is
> minimal; but as the web is a good enough resourse please assume my
> knowledge of those areas.


> The target for this parser generator would be a language similar
> to C#,...


and our moderator noted:


> [These topics are all covered in compiler textbooks, many of which are
> listed in the FAQ. Assuming by "dyamic language" you mean that the
> syntax can change on the fly, you should learn about Early and Tomita
> parsers, and pay particular attention to the reasons that even though
> they're technically perfectly sound, few people use them. As to why
> an LR parser can't parse Python or C++, LR parsers only handle a
> subset of BNF. In particular they can't handle the ambiguity of C++
> declarations.] -John


Chapter 9 (section 9.3.3) of Adapting to Babel (see sig line below for link)
covers the parsing of Perl and C++ using adaptive grammars, and deals with
C++ declarations, including the one case in C++ where declarations can
follow the first use of a call (that is, member functions called by member
functions in inline member functions in a class declaration). The C++
grammar used in the experiments has some 292+ productions, but it handles
quite a lot of C++'s constructions, and does so without any disambiguating
code. In a sense, a language like C++ might be said to be "mildly
dynamic" -- although this might be debated. (For instance, i++ is
syntactically legal, but semantically so only if the type of i has the ++
operator defined.)


Anyone interested in dynamic languages might, therefore, consider ATB as a
newly available look at such parsers.


Part of what makes writing grammars for such languages more manageable is
the fact that they can be developed in an IDE, much like programming
languages, and "debugged" in minute detail (and profiled) against test
cases. To date, the engine in its current incarnation has shown no
real-world language parses in O(n^3) or greater time, and in many cases, on
real-world languages, expensive productions are mitigated by their
locality -- that is, the most expensive productions tend to fire only
rarely, and thus not slow down the whole. (Of course, in theory-space, such
productions would predominate the big-O, but in practice-space, there is no
such thing as an infinitely long input.)


Perl in particular has some constructions that make adaptive grammars very
useful. The C# grammar put together in this formalism didn't require a
single adaptive production (and Vladamir mentions his goal is a language
similar to C#), so it is likely that any adaptive productions would be
minimal, and thus mitigated.


Cheers.


--
Quinn Tyler Jackson FRSA MBCS
http://members.shaw.ca/qtj/


Author of Adapting to Babel:
http://members.shaw.ca/qtj/writing/cs/AdaptingToBabel.html


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.