Re: parsing C and C++, Generating a simple hand-coded like

Chris F Clark <cfc@shell01.TheWorld.com>
24 Dec 2006 16:34:37 -0500

From comp.compilers

Related articles
[4 earlier articles]
Re: parsing C and C++, Generating a simple hand-coded like DrDiettrich1@aol.com (Hans-Peter Diettrich) (2006-12-22)
Re: parsing C and C++, Generating a simple hand-coded like derek@knosof.co.uk (Derek M. Jones) (2006-12-22)
Re: parsing C and C++, Generating a simple hand-coded like ik@unicals.com (Ivan A. Kosarev) (2006-12-22)
Re: parsing C and C++, Generating a simple hand-coded like derek@_NOSPAM_knosof.co.uk (Derek M. Jones) (2006-12-22)
Re: parsing C and C++, Generating a simple hand-coded like cfc@shell01.TheWorld.com (Chris F Clark) (2006-12-23)
Re: parsing C and C++, Generating a simple hand-coded like derek@knosof.co.uk (Derek M. Jones) (2006-12-24)
*Re: parsing C and C++, Generating a simple hand-coded like cfc@shell01.TheWorld.com (Chris F Clark)* (2006-12-24)**

| List of all articles for this month |

From:	Chris F Clark <cfc@shell01.TheWorld.com>
Newsgroups:	comp.compilers
Date:	24 Dec 2006 16:34:37 -0500
Organization:	The World Public Access UNIX, Brookline, MA
References:	06-09-029 06-09-042 06-12-078 06-12-086 06-12-091 06-12-097
Keywords:	C++, parse
Posted-Date:	24 Dec 2006 16:34:36 EST

Derek,

>> Thus, you get effectively an n-squared problem.
>
> True, but not very common in practice.

Unless one is writing obfuscated code, either intentionally (or
accidentally!)

> I thought the 'good' news about C++ parsing was that language designers
> have (re)learned the importance of having a grammar that is easy to
> parse using existing tools.

I hope you are right.

> I think the interest in GLR and its ilk is driven by researchers
> interest in doing something new and now having the computing resources
> to do it.

Actually, I think it is a mix. Researchers are always looking for
something new--that's their job. However, the problems of the day
show them which avenues to look down. GLR and backtracking got
popular because some breaktrhoughs were made in their implementation,
which allowed reasearchers to attack some previously ambiguous
problems and C languages rewarded them for doing so. You could get
some nice results that were interesting to others by prusing that
research. My take is that the problems guided the research as that's
the way I viewed it--I wanted solutions that would enable parsing of
C++ in a rigorous manner, so picked up hints that looked likie they
woulld lead that way. However, I can see perceiving it in reverse.

A worthy aside on this, it looks like PERL regexps are close to
falling that same way. There are several researchers that are
pursuing lines of reasearch all aimed at cracking that problem. For
example, Villi Laurikiri's work on tagged regular expressions and
Cezar Champeanu and Sheng Yu's work on pattern expressions. My guess
is that within 5 years, we will see PERL regexps on a fairly sound
basis and several tools that generate new DFA variants that can
process them.

>> Still looking at the title of this thread, I wouldn't hold out much
>> hope for a simple hand-written recursive descent solution to such
>> languages. Moreover, those who think they have such solutions are
>> likely deluding themselves. I doubt they could prove their solution
>> is correct to my satisfaction.
>
> The latest version GNU C & C++ compilers both use recursive descent. In
> my view this was a big mistake for C which will make it much harder for
> third parties to figure out what syntax gcc actually supports. I know
> at least one other major C++ vendor uses a recursive descent parser, so
> perhaps there are some overriding advantages of parsing this way in C++.

Well, hand-written recursive descent does expose one to the full Turing
complete power of one's programming language, but as you say (and I
definitely agree with), that can make it real difficult to be certain
what syntax is actually supported.

I don't think one needs a full Turing machine to parse C++, and if one
is smart, one picks a smaller more constrained and thus more
accessible, provable, and reliable notation for solving the problem
(e.g. an LL or LR grammar with some extensions to deal with specific
difficult spots). However, I think that course is too difficult for
most C++ compiler writers, as it takes dilligence and persistence to
find the solution in the constrained form.(*) Where as with a recusrive
descent parser, one can just hack in a solution and the [side-]effects
on the rest of the language, when something unexpected falls into that
hack, be damned--one can always add another layer or two or three or
four of hacks that patch the hack and attempt to constrain it to what
one actually meant.

*: the dilligence and persistence mentioned above are further
  strained, because many of the tools which extend LL or LR parsing are
  not entirely mature. Thus, one has to fight with not only getting
  something theoretically correct, but also acceptable to a tool which
  may implement the theory quite imperfectly. Perhaps, the RD compiler
  writers are smarter than I give them credit for. I'm just not
  convinced yet.

Just some more opinions,
-Chris

*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)
------------------------------------------------------------------------------

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: parsing C and C++, Generating a simple hand-coded like

Chris F Clark <cfc@shell01.TheWorld.com>24 Dec 2006 16:34:37 -0500

Chris F Clark <cfc@shell01.TheWorld.com>
24 Dec 2006 16:34:37 -0500