Related articles |
---|
[7 earlier articles] |
Re: Infinite look ahead required by C++? cfc@shell01.TheWorld.com (Chris F Clark) (2010-02-10) |
Re: Infinite look ahead required by C++? martin@gkc.org.uk (Martin Ward) (2010-02-11) |
Re: Infinite look ahead required by C++? idbaxter@semdesigns.com (Ira Baxter) (2010-02-13) |
Re: Infinite look ahead required by C++? sh006d3592@blueyonder.co.uk (Stephen Horne) (2010-02-14) |
Re: Infinite look ahead required by C++? wclodius@los-alamos.net (2010-02-13) |
Re: Infinite look ahead required by C++? krzikalla@gmx.de (Olaf Krzikalla) (2010-02-19) |
Re: Infinite look ahead required by C++? ng2010@att.net (ng2010) (2010-02-23) |
Re: Infinite look ahead required by C++? cfc@shell01.TheWorld.com (Chris F Clark) (2010-02-27) |
Re: Infinite look ahead required by C++? bartc@freeuk.com (bartc) (2010-02-28) |
Re: language twiddling, was Infinite look ahead required by C++? cfc@shell01.TheWorld.com (Chris F Clark) (2010-03-01) |
Re: Infinite look ahead required by C++? torbenm@diku.dk (2010-03-02) |
Re: language twiddling, was Infinite look ahead required by C++? gah@ugcs.caltech.edu (glen herrmannsfeldt) (2010-03-03) |
Re: language twiddling, was Infinite look ahead required by C++? bobduff@shell01.TheWorld.com (Robert A Duff) (2010-03-05) |
[10 later articles] |
From: | "ng2010" <ng2010@att.net> |
Newsgroups: | comp.compilers |
Date: | Tue, 23 Feb 2010 17:28:45 -0600 |
Organization: | A noiseless patient Spider |
References: | 10-02-024 10-02-039 |
Keywords: | C++, parse |
Posted-Date: | 25 Feb 2010 21:12:19 EST |
"Stephen Horne" <sh006d3592@blueyonder.co.uk> wrote in message
> On Fri, 5 Feb 2010 22:27:54 -0600, "ng2010" <ng2010@att.net>
> wrote:
>
>>What elements of C++ make it so hard to parse? Is it a weakness of
>>compiler designs rather than a weakness of the language design? I've
>>read
>>somewhere that the language requires potentially infinite look ahead.
>>Why? And how do compilers handle it?
>>[It's ambiguous syntax. Others can doubtless fill in the
>>details. -John]
>
> Most C and C++ compilers seem to use a hand-crafted mix of recursive
> descent and precedence parsing. The main reason for that is because
> that's what the language designers used from the start, and therefore
> the language is most easily parsed using that approach.
>
> Consider a typical C++ variable declaration...
>
> int x;
>
> No problem there. But now, let's assume that we want a variable of
> some struct type.
>
> mystruct x;
>
> The fact that "mystruct" identifies a type is significant - it is how
> this is recognised as a variable declaration. But how does the parser
> *know* that "mystruct" is a type at all?
Well, because either 1.) a header file was included before that var
declaration or 2.) a module was imported with an import statement before
the var declaration. If neither 1 or 2, then the lexer or parser would
have to flag it as an error. I'm thinking about implementing the
following syntax to help the compiler:
// type declaration
type struct mystruct
{
int32 x
}
// var declaration
var mystruct x
> The answer is that semantic analysis of earlier parts of the source
> code was done concurrently with the parsing.
Maybe for a one-pass compiler? I'm thinking that multi-pass and separate
compilation phases is the way to go.
> The grammar is strictly ambiguous. "mystruct" is, in normal parsing
> terms, just an identifier.
I don't think the compiler can say anything about it for sure if 1 or 2
above were not present to tell it what mystruct is.
> In short, to separate out parsing from semantic analysis, you would
> have to accept incredibly ambiguous parser output and filter down the
> options by semantic analysis.
I have a feeling I'm going to learn that the hard way.
>
> C used to require that you write something like...
>
> struct mystruct x;
>
> This neatly resolves the ambiguity issue, but also implies an
> additional namespace, which is exactly what C used.
I don't think it is necessary to have the struct keyword on var
declarations, but I'm considering doing the following anyway:
var mystruct x // the compiler better know what a mystruct is at this
point
// or it's an error of omission by the programmer
>
> There are parser generators now that can handle C and C++. One example
> is Kelbt.
>
> http://www.complang.org/kelbt/
>
> A partial C++ grammar is available as a proof of concept.
>
> I've been using Kelbt for a while now - it is perfectly usable, though
> with some build-time error-handling issues and lacking some obvious
> basic features (precedence, associativity, syntax error recovery). The
> resulting parsers are fine, but I've never needed any of the advanced
> features - the reason being that the DSLs I've used it for have syntax
> that is LR(1) by design.
>
I'm trying to avoid lex/parse generators. I want to do it by hand and
bootstrap from C/C++ until my language can compile itself.
Return to the
comp.compilers page.
Search the
comp.compilers archives again.