Re: Infinite look ahead required by C++?

"ng2010" <ng2010@att.net>
Tue, 23 Feb 2010 17:28:45 -0600

          From comp.compilers

Related articles
[7 earlier articles]
Re: Infinite look ahead required by C++? cfc@shell01.TheWorld.com (Chris F Clark) (2010-02-10)
Re: Infinite look ahead required by C++? martin@gkc.org.uk (Martin Ward) (2010-02-11)
Re: Infinite look ahead required by C++? idbaxter@semdesigns.com (Ira Baxter) (2010-02-13)
Re: Infinite look ahead required by C++? sh006d3592@blueyonder.co.uk (Stephen Horne) (2010-02-14)
Re: Infinite look ahead required by C++? wclodius@los-alamos.net (2010-02-13)
Re: Infinite look ahead required by C++? krzikalla@gmx.de (Olaf Krzikalla) (2010-02-19)
Re: Infinite look ahead required by C++? ng2010@att.net (ng2010) (2010-02-23)
Re: Infinite look ahead required by C++? cfc@shell01.TheWorld.com (Chris F Clark) (2010-02-27)
Re: Infinite look ahead required by C++? bartc@freeuk.com (bartc) (2010-02-28)
Re: language twiddling, was Infinite look ahead required by C++? cfc@shell01.TheWorld.com (Chris F Clark) (2010-03-01)
Re: Infinite look ahead required by C++? torbenm@diku.dk (2010-03-02)
Re: language twiddling, was Infinite look ahead required by C++? gah@ugcs.caltech.edu (glen herrmannsfeldt) (2010-03-03)
Re: language twiddling, was Infinite look ahead required by C++? bobduff@shell01.TheWorld.com (Robert A Duff) (2010-03-05)
[10 later articles]
| List of all articles for this month |
From: "ng2010" <ng2010@att.net>
Newsgroups: comp.compilers
Date: Tue, 23 Feb 2010 17:28:45 -0600
Organization: A noiseless patient Spider
References: 10-02-024 10-02-039
Keywords: C++, parse
Posted-Date: 25 Feb 2010 21:12:19 EST

"Stephen Horne" <sh006d3592@blueyonder.co.uk> wrote in message
> On Fri, 5 Feb 2010 22:27:54 -0600, "ng2010" <ng2010@att.net>
> wrote:
>
>>What elements of C++ make it so hard to parse? Is it a weakness of
>>compiler designs rather than a weakness of the language design? I've
>>read
>>somewhere that the language requires potentially infinite look ahead.
>>Why? And how do compilers handle it?
>>[It's ambiguous syntax. Others can doubtless fill in the
>>details. -John]
>
> Most C and C++ compilers seem to use a hand-crafted mix of recursive
> descent and precedence parsing. The main reason for that is because
> that's what the language designers used from the start, and therefore
> the language is most easily parsed using that approach.
>
> Consider a typical C++ variable declaration...
>
> int x;
>
> No problem there. But now, let's assume that we want a variable of
> some struct type.
>
> mystruct x;
>
> The fact that "mystruct" identifies a type is significant - it is how
> this is recognised as a variable declaration. But how does the parser
> *know* that "mystruct" is a type at all?


Well, because either 1.) a header file was included before that var
declaration or 2.) a module was imported with an import statement before
the var declaration. If neither 1 or 2, then the lexer or parser would
have to flag it as an error. I'm thinking about implementing the
following syntax to help the compiler:


// type declaration
type struct mystruct
{
        int32 x
}


// var declaration
var mystruct x


> The answer is that semantic analysis of earlier parts of the source
> code was done concurrently with the parsing.


Maybe for a one-pass compiler? I'm thinking that multi-pass and separate
compilation phases is the way to go.


> The grammar is strictly ambiguous. "mystruct" is, in normal parsing
> terms, just an identifier.


I don't think the compiler can say anything about it for sure if 1 or 2
above were not present to tell it what mystruct is.


> In short, to separate out parsing from semantic analysis, you would
> have to accept incredibly ambiguous parser output and filter down the
> options by semantic analysis.


I have a feeling I'm going to learn that the hard way.


>
> C used to require that you write something like...
>
> struct mystruct x;
>
> This neatly resolves the ambiguity issue, but also implies an
> additional namespace, which is exactly what C used.


I don't think it is necessary to have the struct keyword on var
declarations, but I'm considering doing the following anyway:


var mystruct x // the compiler better know what a mystruct is at this
point
                                // or it's an error of omission by the programmer


>
> There are parser generators now that can handle C and C++. One example
> is Kelbt.
>
> http://www.complang.org/kelbt/
>
> A partial C++ grammar is available as a proof of concept.
>
> I've been using Kelbt for a while now - it is perfectly usable, though
> with some build-time error-handling issues and lacking some obvious
> basic features (precedence, associativity, syntax error recovery). The
> resulting parsers are fine, but I've never needed any of the advanced
> features - the reason being that the DSLs I've used it for have syntax
> that is LR(1) by design.
>


I'm trying to avoid lex/parse generators. I want to do it by hand and
bootstrap from C/C++ until my language can compile itself.



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.