Re: problem with C grammar

Chris F Clark <cfc@shell01.TheWorld.com>
30 May 2006 18:39:29 -0400

          From comp.compilers

Related articles
problem with C grammar the.malfunction@googlemail.com (the.malfunction@googlemail.com) (2006-05-26)
Re: problem with C grammar ik@unicals.com (Ivan A. Kosarev) (2006-05-26)
Re: problem with C grammar DrDiettrich@compuserve.de (Hans-Peter Diettrich) (2006-05-30)
Re: problem with C grammar cfc@shell01.TheWorld.com (Chris F Clark) (2006-05-30)
problem with C grammar a.t.hofkamp@tue.nl (A.T.Hofkamp) (2006-05-30)
Re: problem with C grammar rsc@swtch.com (Russ Cox) (2006-05-30)
Re: problem with C grammar idbaxter@semdesigns.com (Ira Baxter) (2006-06-03)
Re: problem with C grammar hebisch@math.uni.wroc.pl (Waldek Hebisch) (2006-06-03)
Re: problem with C grammar mailbox@dmitry-kazakov.de (Dmitry A. Kazakov) (2006-06-05)
| List of all articles for this month |

From: Chris F Clark <cfc@shell01.TheWorld.com>
Newsgroups: comp.compilers
Date: 30 May 2006 18:39:29 -0400
Organization: The World Public Access UNIX, Brookline, MA
References: 06-05-078 06-05-082
Keywords: C, syntax
Posted-Date: 30 May 2006 18:39:29 EDT

<the.malfunction@googlemail.com> wrote in message


> Is there any way to change the grammar such that I can use
> IDENTIFIER instead of TYPE_NAME here without having all those
> conflicts?


As Ivan Kosarev said, the short answer is "no", particularly not with
Bison (or Yacc) or any other non-predicated parser. If you use a
predicated parser, e.g. PCCTS, you can move the symbol table query
into the parser. You can do the same thing with "adaptive" parsers
like Meta-S.


However, if you are using a plain LL or LR parser, you can't query the
symbol table as part of the grammar. There is no BNF construct that
does a symbol table query. And, this is the point of the TYPE_NAME
token, to represent an IDENTIFIER that hase been declared to represent
a typedef (or is a built-in type name). Discerning that requires one
to inspect the symbol table (or whereever you store which names are
declared what).


It is worth noting that this problem is not new. If I recall
correctly, Frank Deremer wrote describing the canonical solution back
in the 70's (or perhaps 60's). He said that parsing was a 3 step
process. He broke the lexer part into two steps: the scanner and the
screener. The scanner is what one typically writes in regular
expressions (e.g. LEX). The screener then takes those tokens and
matches them against a symbol table to distinguish keywords (and type
name) from normal identifiers. The screener does not have to be part
of the lexer per se, but unless your parser can execute code that does
symbol table queries, it can't be part of the parser (because it can't
be expressed in BNF). Technically, the C language is not LR (and thus
not LL either).


Thus, you cannot replace the TYPE_NAME references in a yacc grammar
with IDENTIFIER references without getting conflicts. The grammar is
depending on some other phase (the screener) to assist it and make the
language acceptable, by changing the language into something that is
LR.


Hope this helps,
-Chris


*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)
------------------------------------------------------------------------------



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.