Re: C++ Grammar - Update

Martin von Loewis <loewis@informatik.hu-berlin.de>
30 Apr 2001 22:21:02 -0400

          From comp.compilers

Related articles
C++ Grammar - Update mike@dimmick.demon.co.uk (Mike Dimmick) (2001-04-26)
Re: C++ Grammar - Update loewis@informatik.hu-berlin.de (Martin von Loewis) (2001-04-30)
Re: C++ Grammar - Update idbaxter@semdesigns.com (Ira D. Baxter) (2001-05-03)
Re: C++ Grammar - Update mike@dimmick.demon.co.uk (Mike Dimmick) (2001-05-03)
Re: C++ Grammar - Update gahide@lil.univ-littoral.fr (Patrice Gahide) (2001-05-03)
Re: C++ Grammar - Update michael_spencer@btclick.com (Michael Spencer) (2001-05-07)
Re: C++ Grammar - Update michael_spencer@btclick.com (Michael Spencer) (2001-05-13)
Re: C++ Grammar - Update loewis@informatik.hu-berlin.de (Martin von Loewis) (2001-05-13)
| List of all articles for this month |
From: Martin von Loewis <loewis@informatik.hu-berlin.de>
Newsgroups: comp.compilers,comp.compilers.tools.pccts
Date: 30 Apr 2001 22:21:02 -0400
Organization: Humboldt University Berlin, Department of Computer Science
References: 01-04-141
Keywords: C++, parse
Posted-Date: 30 Apr 2001 22:21:02 EDT

"Mike Dimmick" <mike@dimmick.demon.co.uk> writes:


> The major reported problem with the C++ syntax is that it requires
> semantic information to parse correctly. This isn't strictly true,
> one can follow the technique of Ed Willink
> (http://www.computing.surrey.ac.uk/research/dsrg/fog/FogThesis.html)


Please note that the goal of that parser is restricted to parsing
declarations only (see 4.4, Ambiguity resolution).


It seems that the parser accepts a *very* large superset of C++,
e.g. the provided Solaris binary accepts


void foo(){
    +
}


without complaints. So I still doubt that you can do meaningful C++
parsing w/o semantic analysis in the lexer.


> One must know whether a construct names a type in order to correctly
> parse in some circumstances.


Indeed, this is the major reason why people claim that you need
semantic information in the lexer.


> Qualified names are another circumstance which require unlimited
> semantic lookahead. This is due to template names with attached
> argument lists being permitted in a qualified name.


There are actually ambiguities in this area, consider


class X{
    friend A::B::C();
};


Is this ::C, returning A::B, or is it ::B::C, returning A? This is
currently an ambiguity in C++, which is not resolved in the '98
edition of the standard.


> It is necessary to resolve the exact instantiation of the template
> to determine whether the contents of the template themselves name a
> class (in which case a following "::" should continue the qualified
> name).


You mean, to see whether


    A<k>::B


is a typename or not? In C++, it is never a typename; to make it a
typename, you have to write


    typename A<k>::B


> I believe I have previously posted on at least one of these two
> newsgroups regarding the rule in the standard which requires this
> behaviour; it can be summarised as "the members of one instantiation
> of a template need bear no relation to any other instantiation of a
> template." This leaves us in the ridiculous situation of requiring
> full template instantiation and expression evaluation in order to
> produce an AST.


That is surely not the case. Whether something is a typename or not
can be determined without instantiation.


> C++ name resolution is complicated by the fact that the global
> namespace has no name; it is referred to by prefixing a name
> (qualified or not) with the scope resolution operator "::". This
> causes more ambiguities resolvable by left-factoring the grammar.


So out of curiosity: What does your parser with my friend example
above?


> The "declaration specifiers" rule (decl-specifiers) has been modified
> to accommodate only one user-defined type or a sequence of built in
> types. This is slightly complicated by the fact that modifiers may be
> interspersed between the built-in types (e.g. "unsigned const long
> static int") but this removes the problem of whether a name in a
> declaration is the type or the declarator. This decision was taken
> because the C++ standard has now disallowed implicit 'int' - and
> therefore all declarations must be "type-name declarator-list;".


I think this is also an error in the FOG thesis: The only case where
the decl-specifier-seq can be ommitted is the constructor/destructor;
so I can't see why "i=0;" is ambiguous.


> I conclude that C++ requires some very strong parsing methods if one
> is to be successful.


In any case, a very interesting posting. I hope you can post your
grammar, together with this elaboration, somewhere in the 'net.


Regards,
Martin


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.