parsing c++ without a symbol table!

27 Jul 1998 11:46:18 -0400

          From comp.compilers

Related articles
parsing c++ without a symbol table! (KNAPEN, GREGORY) (1998-07-27)
Re: parsing c++ without a symbol table! (David L Moore) (1998-07-27)
Re: parsing c++ without a symbol table! (Quinn Tyler Jackson) (1998-07-28)
Re: parsing c++ without a symbol table! (Jason Merrill) (1998-07-28)
Re: parsing c++ without a symbol table! (1998-07-30)
| List of all articles for this month |

Newsgroups: comp.compilers
Date: 27 Jul 1998 11:46:18 -0400
Organization: Bell Canada / Bell Sygma
Keywords: C++, parse


I am building a c++ parser that recognizes c++ by using the syntax only.
I don't use any semantic information i.e. there is no need for a symbol
table. Of course, this parser can not be use as a compiler because the
language contains ambiguities. This parser is intended to gather metrics
from source code.

While doing this project, I found that most of c++ can be parsed by
using the syntax alone except for three cases:

1. ambiguity between function call and variable declaration

ex: T(a); or T(*a); etc..

this would be a variable declaration if T is a type or a function call
if T is a function.

2. ambiguity between function declaration and variable declaration

ex: int X(A);

if A is a type A X is a function declaration
if A is a variable x is a var initialized with A

3. ambiguous parameter

ex: int F(T(C));

if C is a type the declaration becomes int F(T(*fp)(C c));
if C is a new id it becomes int F(T C);

I was wodering if there were other such cases where a sentence needs
semantic information to be made non ambiguous. Any case that can be
recognized by de syntax alone does not qualify. I assume that I have
infinite lookahead(backtracking).

For example, a c-style type cast is usually recognized by checking if
the identifier between parenthesis is a type or not. It is possible to
find a type cast by the syntax alone.

var = (Type1)(Type2)...(TypeN)(expression);

an expression between () is a type cast if and only if it is followed by
another typecast or an expression. This requires a lot of
backtracking(inefficient) but it illustrates the point that the sentence
can be recognized without using semantic information.

So I was wondering if there were other families of sentences besides the
ones listed that required semantic information to be made non ambiguous?

Greg Knapen

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.