Parsing C

jar@florida.HQ.Ileaf.COM (Jim Roskind x5570)
Wed, 7 Aug 91 20:13:24 EDT

          From comp.compilers

Related articles
Re: When is a typedef name not a typedef name? mengel@fnal.fnal.gov (1991-07-31)
Parsing C jar@florida.HQ.Ileaf.COM (1991-08-07)
| List of all articles for this month |

Newsgroups: comp.compilers
From: jar@florida.HQ.Ileaf.COM (Jim Roskind x5570)
Keywords: C, parse
Organization: Compilers Central
References: 91-08-014
Date: Wed, 7 Aug 91 20:13:24 EDT

mengel@fnal.fnal.gov (Marc W. Mengel) writes
>
> jwl@sag4.ssl.berkeley.edu (Jim Lewis) writes:
>
> > ...the following code is legal:
>
> >typedef int foo;
>
> >struct bar { foo foo; };
> ...
> >Rewriting the grammar seems hopeless -- every variation I've come up
> >with resulted in reduce/reduce conflicts.
>
> True, but if you look at the conflict, (i.e. in a simpler grammar
> like...
..
> [Lest it not be obvious, this is a part of C syntax which is painfully
> context dependent, which means that yacc cannot handle it without some sort
> of kludge, be it a reduce/reduce conflict or more typically some sort of
> ad-hoc code in the lexer. -John]


Actually, rewriting the grammar is not a hopeless ordeal. The grammar
that I have posted takes care of these problems. The editor is quite
correct that this is a very context sensitive area, but both posters
provided the standard context dependent hack in their description, as
they distinguish between "typename" and "identifier" at the lexical
level (which implies context sensitivity). With this hack, the
problem is very solvable (no parsing conflicts need arise in a C
grammar).


The example given above is not really much cause for a dilemma, and
could have been dealt with in a context free manner. Specifically,
the first "foo" must unambiguously be a typename (ANSI C precludes
omission of a declaration specifier in all cases except function
definitions), and the second "foo" must unambiguously be a declarator
(ANSI C does not permit two typedefnames to be combined to form a
declaration specifier).


The real problem is seen in such code as:


f(*b)[4];


while in a function body. Depending on context, this can have widely
varied meaning. For example:


int * f(char);
char * b;
main() {
f(*b)[4]; /* indirect on "b", call "f", and index the result
/* by "4" */
}


alternatively, with the context:


typedef int f;
char * b;
main() {
f(*b)[4]; /* redeclare "b" to be a pointer to 4 ints */
}


All of these issues are discussed, along with several other finer
points in the papers that are supplied with my grammars.


If you are interested in these issues, the papers included with the C
and C++ grammars discuss many of these points:


Doug Lea and Doug Schmidtt have graciously offered to provide anonymous
ftp sites for the 8 files, as well as the Berkeley YACC source (if you
need it).


ics.uci.edu (128.195.1.1) in the ftp/pub directory as:


                c++grammar2.0.tar.Z
                byacc1.8.tar.Z


mach1.npac.syr.edu (128.230.7.14) in the ftp/pub/C++ directory as:


                c++grammar2.0.tar.Z
                byacc1.8.tar.z


Note that the papers also discuss methods of removing conflicts from
LR grammars, as well as a discussion of LALR-only conflicts, and a
discussion of the difficulty of error recovery in LALR based parsers.


Jim Roskind
Independent Consultant
(407)729-4348 or (617)290-4990 x5570
jar@hq.ileaf.com or ...!uunet!leafusa!jar
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.