Re: Maintaining scope while parsing C with a YACC grammar

torbenm@diku.dk (Torben Ęgidius Mogensen)
Tue, 03 May 2011 09:51:14 +0200

          From comp.compilers

Related articles
Maintaining scope while parsing C with a YACC grammar eliben@gmail.com (eliben) (2011-04-25)
Re: Maintaining scope while parsing C with a YACC grammar bobduff@shell01.TheWorld.com (Robert A Duff) (2011-04-26)
Re: Maintaining scope while parsing C with a YACC grammar bobduff@shell01.TheWorld.com (Robert A Duff) (2011-04-26)
Re: Maintaining scope while parsing C with a YACC grammar eliben@gmail.com (eliben) (2011-04-28)
Re: Maintaining scope while parsing C with a YACC grammar bobduff@shell01.TheWorld.com (Robert A Duff) (2011-05-02)
Re: Maintaining scope while parsing C with a YACC grammar torbenm@diku.dk (2011-05-03)
Re: Maintaining scope while parsing C with a YACC grammar paul@paulbmann.com (Paul B Mann) (2011-05-06)
Re: Maintaining scope while parsing C with a YACC grammar idbaxter@semdesigns.com (Ira Baxter) (2011-05-13)
Maintaining scope while parsing C with a Yacc grammar cfc@shell01.TheWorld.com (Chris F Clark) (2011-06-12)
| List of all articles for this month |

From: torbenm@diku.dk (Torben Ęgidius Mogensen)
Newsgroups: comp.compilers
Date: Tue, 03 May 2011 09:51:14 +0200
Organization: SunSITE.dk - Supporting Open source
References: 11-04-036 11-04-038 11-05-003
Keywords: C, parse
Posted-Date: 04 May 2011 13:53:09 EDT

eliben <eliben@gmail.com> writes:
> Since it's parsing of C I'm talking about, this approach will have to
> somehow handle ambiguity of this kind:
>
> T * x;
>
> This can be either a declaration or a multiplication, depending on
> earlier symbol table information (whether T is a type or not).


One technique for handling this is to let the lexer access the symbol
table and determine if T is a type name or not and generate different
tokens for these. The grammar would then have productions somewhat
like


Declaration -> Type non-type-id
                          | ...


Type -> type-id
            | Type *
            | ...


Expression -> Expression * Expression
                        | non-type-id
                        | ...


It becomes much more complicated for real C, but the idea should be
clear enough.


This requires the parser to keep a symbol table for the current scope
available to the lexer. This table needs not contain full information
for each identifier, just enough to distinguish type names from other
names.


That said, I consider this kind of ambiguity bad language design, as
it is not only hard for a parser to handle, but also hard for a human
reader. Possible fixes are to make declarations and expressions /
statements non-overlapping syntactically (as in Pascal) or to keep
type names syntactically distinct from variable names, e.g. by making
type names start with upper case letter and variable names start with
lower case letters (as in Haskell).


Torben
[As Dennis said, "the ice is thin here." -John]



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.