Related articles |
---|
problem with C grammar the.malfunction@googlemail.com (the.malfunction@googlemail.com) (2006-05-26) |
Re: problem with C grammar ik@unicals.com (Ivan A. Kosarev) (2006-05-26) |
Re: problem with C grammar DrDiettrich@compuserve.de (Hans-Peter Diettrich) (2006-05-30) |
Re: problem with C grammar cfc@shell01.TheWorld.com (Chris F Clark) (2006-05-30) |
problem with C grammar a.t.hofkamp@tue.nl (A.T.Hofkamp) (2006-05-30) |
Re: problem with C grammar rsc@swtch.com (Russ Cox) (2006-05-30) |
Re: problem with C grammar idbaxter@semdesigns.com (Ira Baxter) (2006-06-03) |
Re: problem with C grammar hebisch@math.uni.wroc.pl (Waldek Hebisch) (2006-06-03) |
Re: problem with C grammar mailbox@dmitry-kazakov.de (Dmitry A. Kazakov) (2006-06-05) |
From: | Waldek Hebisch <hebisch@math.uni.wroc.pl> |
Newsgroups: | comp.compilers |
Date: | 3 Jun 2006 19:00:19 -0400 |
Organization: | Politechnika Wroclawska |
References: | 06-05-078 |
Keywords: | C, parse |
Posted-Date: | 03 Jun 2006 19:00:19 EDT |
the.malfunction@googlemail.com <the.malfunction@googlemail.com> wrote:
> As it seems, the lexical scanner is proposed to analyse whether an
> identifier is a type or not. I rather would like to let the parser do
> this job. Is there any way to change the grammar such that I can use
> IDENTIFIER instead of TYPE_NAME here without having all those conflicts?
> [This is the hardest part of parsing C. -John]
As other noted this is a hard problem. But if you really want to parse
C without using a symbol table, then there is a way. I learned the
idea from Willink thesis (but it is a natural one). Namely, parser
has two jobs:
-- check systactic validity of input
-- discover systactic structure of input
Textbooks typically mix both, but once you accept the difference you
may try to use a "superset parser" -- a parser which builds correct
parse tree, but which may accept bigger language.
Also, traditional grammars frequently mix syntax with semantics. However,
it is unreasonable to require that parser _produces_ semantic info,
but is forbidden to use it. So, if parser should work without using
semantic info the resulting parse tree may contain less semantic info
than traditional parse.
For example, if you see:
int (* y)
you could generate something like:
operator_()
/ \
int operator_unary_*
\
y
and let the semantic analysis decide if the snippet above declares
a pointer variable or is a call to a function named "int".
There a problem here: declarator in C is similar to expression, but
not identical to it. If you do not know which one to expect you
need to accept both, so the parser will accept a lot of junk.
Finally, while it is possible to parse C without using symbol table
you should have good reason to do so. Willink wanted to implement
"syntax aware" macros -- which means that new types may be declared
_after_ parsing. One may wish to parse macros befor expansion (to
check their correctness). But without good reason to do otherwise
using symbol table to recognize types is the easiest method to
parse C.
--
Waldek Hebisch
hebisch@math.uni.wroc.pl
Return to the
comp.compilers page.
Search the
comp.compilers archives again.