Re: Infinite look ahead required by C++?

torbenm@diku.dk (Torben Ęgidius Mogensen)
Tue, 02 Mar 2010 11:19:09 +0100

          From comp.compilers

Related articles
[10 earlier articles]
Re: Infinite look ahead required by C++? sh006d3592@blueyonder.co.uk (Stephen Horne) (2010-02-14)
Re: Infinite look ahead required by C++? wclodius@los-alamos.net (2010-02-13)
Re: Infinite look ahead required by C++? krzikalla@gmx.de (Olaf Krzikalla) (2010-02-19)
Re: Infinite look ahead required by C++? ng2010@att.net (ng2010) (2010-02-23)
Re: Infinite look ahead required by C++? cfc@shell01.TheWorld.com (Chris F Clark) (2010-02-27)
Re: Infinite look ahead required by C++? bartc@freeuk.com (bartc) (2010-02-28)
Re: Infinite look ahead required by C++? torbenm@diku.dk (2010-03-02)
| List of all articles for this month |

From: torbenm@diku.dk (Torben Ęgidius Mogensen)
Newsgroups: comp.compilers
Date: Tue, 02 Mar 2010 11:19:09 +0100
Organization: UNI-C
References: 10-02-024
Keywords: C++, parse
Posted-Date: 03 Mar 2010 01:16:45 EST

"ng2010" <ng2010@att.net> writes:


> What elements of C++ make it so hard to parse? Is it a weakness of
> compiler designs rather than a weakness of the language design? I've read
> somewhere that the language requires potentially infinite look ahead.
> Why? And how do compilers handle it?


Unbounded lookahead is not so bad. You can fairly easily make parsers
handle that, and humans are pretty good at handling unbounded
lookahead -- even though a syntactic construction might theoretically
require unbounded lookahead, in most programs the disambiguating
characters are not that far ahead. Ambiguity is not so abd either if
you can use simple local disambiguation rules (such as operator
precedence).


The problem with C++ is that disambiguation can require a very large
context. For example, the sequence f<t>(e1,e2) can either mean
instantiation and call of a template function or two comparisons and a
comma expression. You need to know the declaration of f to
disambiguate, and since if might be declared in a different module,
this is not easy for the human reader. A parser usually handles it by
having a symbol table around to distinguish names for templates,
functions, types and so on. But even this is not enough to
disambiguate -- there are special cases where you need different kinds
of context to disambiguate.


If you are a first-time language designer, you should use a parser
generator and redesign your syntax until the parser generator accepts
it without conflicts (but by all means use operator precedence
declarations). While there may be reasons to go beyond LL(k) or
LALR(1) (which is what most parser generators are restricted to), the
issues not only in parsing but also in human readability become
complex and you are more likely to make a bad design than a good one.
Once you have more experience in language design, you can try to be
more advanced.


Torben



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.