Re: parsing C and C++, Generating a simple hand-coded like

Chris F Clark <cfc@shell01.TheWorld.com>
22 Dec 2006 01:08:18 -0500

          From comp.compilers

Related articles
[4 earlier articles]
Re: Generating a simple hand-coded like recursive descent parser mr.waverlye@verizon.net (Mr.E) (2006-09-16)
Re: Generating a simple hand-coded like recursive descent parser tommy.thorn@gmail.com (Tommy Thorn) (2006-09-18)
Re: Generating a simple hand-coded like recursive descent parser DrDiettrich1@aol.com (Hans-Peter Diettrich) (2006-12-16)
Re: Generating a simple hand-coded like recursive descent parser bobduff@shell01.TheWorld.com (Robert A Duff) (2006-12-17)
Re: Generating a simple hand-coded like recursive descent parser cfc@shell01.TheWorld.com (Chris F Clark) (2006-12-19)
Re: parsing C and C++, Generating a simple hand-coded like gneuner2/@comcast.net (George Neuner) (2006-12-21)
Re: parsing C and C++, Generating a simple hand-coded like cfc@shell01.TheWorld.com (Chris F Clark) (2006-12-22)
Re: parsing C and C++, Generating a simple hand-coded like DrDiettrich1@aol.com (Hans-Peter Diettrich) (2006-12-22)
Re: parsing C and C++, Generating a simple hand-coded like derek@knosof.co.uk (Derek M. Jones) (2006-12-22)
Re: parsing C and C++, Generating a simple hand-coded like ik@unicals.com (Ivan A. Kosarev) (2006-12-22)
Re: parsing C and C++, Generating a simple hand-coded like derek@_NOSPAM_knosof.co.uk (Derek M. Jones) (2006-12-22)
Re: parsing C and C++, Generating a simple hand-coded like cfc@shell01.TheWorld.com (Chris F Clark) (2006-12-23)
Re: parsing C and C++, Generating a simple hand-coded like derek@knosof.co.uk (Derek M. Jones) (2006-12-24)
[1 later articles]
| List of all articles for this month |

From: Chris F Clark <cfc@shell01.TheWorld.com>
Newsgroups: comp.compilers
Date: 22 Dec 2006 01:08:18 -0500
Organization: The World Public Access UNIX, Brookline, MA
References: 06-09-029 06-09-042 06-09-048 06-09-060 06-09-078 06-09-093 06-12-064 06-12-066 06-12-076 06-12-078
Keywords: C, C++, parse

I wrote:
>>1) James Roskind did build a C grammar which attempts to eliminate the
>> need for feedback.
...
>> All that could be done in C++
>> is to push the ambiguities far-enough away that the amount of
>> lookahead required made them impractical to distinguish.


George Neuner <gneuner2/@comcast.net> wrote:


> Do you recall some of the problem cases?


I don't recall all the exact details.


I do recall the basic issue was with the C++ rule that if some text
could be parsed as an declaration, it was an declaration, and
otherwise it was a expression (or do I have that reversed?). This was
compounded by the "function prototype" syntax. So, if one saw
something like:


a( b (c), d (e), f (g) );


depending on whether b, d, and f were typedefs or not, this might be a
function prototype or it might be a function call. Notice, that, the
pattern is repeating and can be extended indefintely, and if f isn't a
typedef, then b(c) is a cast expression and not a parameter
declaration. Moreover, there are even more complicated examples,
e.g. parenthesis can be aribrarily deep, and I think some operators
like & and * can be used.


There are some interesting cases, like


a (b * c, d (e));


If "b" is a typedef, then "b * c" must be a declaration and not an
expression, because the left hand side of a binary * (i.e. "b") cannot
be a type, and if it is a unary * then "b * c" must be a cast and a
cast would require some parenthesis, either around "b" or around or
"* c". In that case, the whole a(...) clause must be a function
prototype (or a syntax error) and "d" must be a typedef.


> Similarly "x * y" can be parsed as simply
>
> OP:*
> IDENT:x
> IDENT:y
>
> and figured out afterward.


The problem in C derivatives, is you want the expressions to have a
different "shape" if it is a declaration and not an expression. In
the expression case, * is the root and the trees are divided below it.
In the declaration case, the * is a unary operator that binds tightly
to the 2nd tree, and there is no "root operator" that joins the
type-specification (left tree) and the declarator (right tree).
Getting a tree of the right shape, should be something that the parser
does, even if it leaves out a few details (i.e. are the expressions
parameters to a function or substripts of an array is a detail that
wouldn't change the shape of the list of expressions, just the details
of their meaning).


C (and most of its derivatives) are awful languages to parse because
of this ambiguous declaration syntax. The core ideas behind it are
simple, elegant, and marvelously brilliant. Unfortunately, as the
language grew and got extended, the combination didn't scale well.
However, until someone attempted the extension, I'm not sure the fact
that the syntax wouldn't scale would have been obvious. In fact, it
wasn't until we got experience writing C++ that the flaws became
noticeable at all, and even then it looked like a few special cases
could patch the problem.


Unfortunately, these problems are not just issues for compiler
writers, which if that were the only flaw, it would have been fine.
It can actually give people writing code in derivatives like C++
problems. The rules between constructors, function prototpyes,
function calls, and casts are not immediately obvious. Moreover, when
one forgets, one can get a very natural looking program, that doesn't
mean what one intended.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.