From: | Chris F Clark <cfc@shell01.TheWorld.com> |
Newsgroups: | comp.compilers |
Date: | 23 Dec 2006 13:37:00 -0500 |
Organization: | The World Public Access UNIX, Brookline, MA |
References: | 06-09-029 06-09-042 06-09-048 06-09-060 06-09-078 06-09-093 06-12-064 06-12-066 06-12-076 06-12-078 06-12-086 |
Keywords: | C, C++, parse |
Posted-Date: | 23 Dec 2006 13:37:00 EST |
"Derek M. Jones" <derek@knosof.co.uk> writes:
> George,
>
>> Note that I am in no way championing this style of compiler design,
>> and I have never written a C or C++ compiler ... most of my work has
>> been in scripting DSLs, Lisp and Pascal derivatives ... but, to date,
>> I haven't encountered any language syntax ambiguities that I believed
>> could not be resolved after parsing was complete.
>
> If you are willing to accept multiple parse trees for the same
> token sequence it is always possible to 'successfully' parse
> a language.
>
> I have a parser for C that does not use any symbol table. The most
> common ambiguous parse is:
>
> f(i); // function taking one argument, or perhaps
> // a declaration of i to have type f with redundant ()
>
> I am currently working on a C++ parser. The main problem with
> C++ is that there are a lot more constructs that lead to an
> ambiguous parse.
>
> I know of at least one other tool vendor who fully parses C/C++ source
> using syntax information only, before building a symbol table.
This sounds like the GLR "parse forest" approach. It may be the best
one can do.
However, if you look at the function prototype examples I supplied,
one can easily see that one has nested ambiguities. That is: not only
is the outer function prototype ambiguous, but nested in each
parameter/argument is another set of ambiguities and potentially
nested within them a third, fourth, and so on level of ambiguities.
Thus, you get effectively an n-squared problem.
This is partially why Jim Roskind gave up. It isn't just unrolling a
loop, and doing a linear exploration of the ambiguities. There is a
whole matrix of ambiguous cases which potentially interact. Or to
borrow from Adventure, "You are in a twisty little maze of ambiguous
productions; some of them may be slightly different."
Perhaps, I've just gotten old and crusty, but it doesn't seem like
that would inspire much confidence in how the rules interact. I think
Dr Roskind found some cases where he thought the rules did not
interact well and wouldn't do want one would expect.
Of course, if you are given an existing language and told to parse it,
it is a little late to revisit that decision. The good news is that
C++ and it's friends have already caused the parser generator writers
to attempt to extend their reach with techniques like backtracking,
PEGs, GLR, predicated parsing, etc. As a result, some of those
languages can be parsed reliably.
Still looking at the title of this thread, I wouldn't hold out much
hope for a simple hand-written recursive descent solution to such
languages. Moreover, those who think they have such solutions are
likely deluding themselves. I doubt they could prove their solution
is correct to my satisfaction.
Just my opinions,
-Chris
*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)
Return to the
comp.compilers page.
Search the
comp.compilers archives again.