Re: problem with C grammar

Waldek Hebisch <hebisch@math.uni.wroc.pl>
3 Jun 2006 19:00:19 -0400

          From comp.compilers

Related articles
problem with C grammar the.malfunction@googlemail.com (the.malfunction@googlemail.com) (2006-05-26)
Re: problem with C grammar ik@unicals.com (Ivan A. Kosarev) (2006-05-26)
Re: problem with C grammar DrDiettrich@compuserve.de (Hans-Peter Diettrich) (2006-05-30)
Re: problem with C grammar cfc@shell01.TheWorld.com (Chris F Clark) (2006-05-30)
problem with C grammar a.t.hofkamp@tue.nl (A.T.Hofkamp) (2006-05-30)
Re: problem with C grammar rsc@swtch.com (Russ Cox) (2006-05-30)
Re: problem with C grammar idbaxter@semdesigns.com (Ira Baxter) (2006-06-03)
Re: problem with C grammar hebisch@math.uni.wroc.pl (Waldek Hebisch) (2006-06-03)
Re: problem with C grammar mailbox@dmitry-kazakov.de (Dmitry A. Kazakov) (2006-06-05)
| List of all articles for this month |
From: Waldek Hebisch <hebisch@math.uni.wroc.pl>
Newsgroups: comp.compilers
Date: 3 Jun 2006 19:00:19 -0400
Organization: Politechnika Wroclawska
References: 06-05-078
Keywords: C, parse
Posted-Date: 03 Jun 2006 19:00:19 EDT

the.malfunction@googlemail.com <the.malfunction@googlemail.com> wrote:
> As it seems, the lexical scanner is proposed to analyse whether an
> identifier is a type or not. I rather would like to let the parser do
> this job. Is there any way to change the grammar such that I can use
> IDENTIFIER instead of TYPE_NAME here without having all those conflicts?
> [This is the hardest part of parsing C. -John]


As other noted this is a hard problem. But if you really want to parse
C without using a symbol table, then there is a way. I learned the
idea from Willink thesis (but it is a natural one). Namely, parser
has two jobs:
-- check systactic validity of input
-- discover systactic structure of input


Textbooks typically mix both, but once you accept the difference you
may try to use a "superset parser" -- a parser which builds correct
parse tree, but which may accept bigger language.


Also, traditional grammars frequently mix syntax with semantics. However,
it is unreasonable to require that parser _produces_ semantic info,
but is forbidden to use it. So, if parser should work without using
semantic info the resulting parse tree may contain less semantic info
than traditional parse.


For example, if you see:


int (* y)


you could generate something like:


                  operator_()
            / \
        int operator_unary_*
                                    \
                                      y


and let the semantic analysis decide if the snippet above declares
a pointer variable or is a call to a function named "int".


There a problem here: declarator in C is similar to expression, but
not identical to it. If you do not know which one to expect you
need to accept both, so the parser will accept a lot of junk.


Finally, while it is possible to parse C without using symbol table
you should have good reason to do so. Willink wanted to implement
"syntax aware" macros -- which means that new types may be declared
_after_ parsing. One may wish to parse macros befor expansion (to
check their correctness). But without good reason to do otherwise
using symbol table to recognize types is the easiest method to
parse C.


--
                                                            Waldek Hebisch
hebisch@math.uni.wroc.pl


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.