Re: Parsing C++

"Jan Gray" <jsgray@acm.org>
19 Nov 1996 00:47:38 -0500

          From comp.compilers

Related articles
Parsing C++ manowar@sauropod.engin.umich.edu (1996-11-15)
Re: Parsing C++ graham.hughes@resnet.ucsb.edu (Graham Hughes) (1996-11-18)
Re: Parsing C++ nagle@netcom.com (1996-11-18)
Re: Parsing C++ jsgray@acm.org (Jan Gray) (1996-11-19)
Re: Parsing C++ jlilley@empathy.com (1996-12-03)
Re: Parsing C++ dlmoore@ix.netcom.com (David L Moore) (1996-12-07)
Re: Parsing C++ jlilley@empathy.com (1996-12-09)
Re: Parsing C++ jlilley@empathy.com (1996-12-09)
Re: Parsing C++ fjh@mundook.cs.mu.OZ.AU (1996-12-10)
Re: Parsing C++ davidb@datalytics.com (1996-12-18)
| List of all articles for this month |

From: "Jan Gray" <jsgray@acm.org>
Newsgroups: comp.lang.c++.moderated,comp.compilers
Date: 19 Nov 1996 00:47:38 -0500
Organization: Netcom
References: 96-11-102 96-11-113
Keywords: parse, C++

manowar@sauropod.engin.umich.edu (Krisztian Flautner) writes:
> Could someone give me some examples of language features that cause
> problems ? Has anyone made an analysis of what kind of grammar could
> be used to parse C++ without problems ?


Graham Hughes <graham.hughes@resnet.ucsb.edu> wrote:
> Imagine the following code:
>
> typedef int Foo;
> int Foo;
>
> int main() {
> Foo * bar;
> }
>
> Now, in the main function: is that a multiplication, or a type
> declaration? It's actually a type declaration, but there's no way an
> LALR grammar is going to know that. Traditionally, you solve this
> problem by passing symbol table information to the lexer.


It is exactly this kind of problem (is this id an object or a type?)
that makes parsing C++ so tricky, even with a grammar in hand.
(Actually it is the mind numbing complexity of it all, but the
question was about grammars).


In simpler languages, id's can often be resolved to particular
declarations by the parser, but for C++, the lexical analyzer may need
to do this to drive the parse.


Consider:
    struct A { int T; typedef int U; };
    struct B : virtual A { typedef int T; int U; };
    struct C : B, virtual A { T(U); };


To parse "T(U);", we must determine whether T and/or U are type names.
Therefore we must look up T and U in C's scope. In this case, because
of C++'s dominance rule, B::T dominates A::T and B::U dominates A::U,
therefore T is a type and U is not. Therefore "T(U);" declares U to
be a B::T (an int).


To make these determinations, lex must be told, upon parsing '{', that
it is now in the scope of C, and it must then search C's nascent
scope. In general, this may involve searching bases (often with
multiple or virtual inheritance (dominance)), any outer nested classes
and their bases, and so on, name spaces, static locals (local classes
can see static local declarations); then optionally perform
disambiguation, access checking, etc.


Not to mention our friends the fully qualified names (X::Y::T) and
template names (list<X::Y::T, less_than<X::Y>, 2*k+1>::T). :-)


It is not surprising that so many former C++ implementers find Java to
be a pleasant change.


Jan Gray
Redmond, WA
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.