Re: ll(1) and variable assignment...

Chris F Clark <cfc@shell01.TheWorld.com>
14 Nov 2004 22:36:27 -0500

          From comp.compilers

Related articles
ll(1) and variable assignment... weltraum@astrocat.de (2004-11-06)
Re: ll(1) and variable assignment... wyrmwif@tsoft.org (SM Ryan) (2004-11-07)
Re: ll(1) and variable assignment... vbdis@aol.com (2004-11-07)
Re: ll(1) and variable assignment... cfc@shell01.TheWorld.com (Chris F Clark) (2004-11-14)
| List of all articles for this month |
From: Chris F Clark <cfc@shell01.TheWorld.com>
Newsgroups: comp.compilers
Date: 14 Nov 2004 22:36:27 -0500
Organization: The World Public Access UNIX, Brookline, MA
References: 04-11-010
Keywords: LL(1), parse
Posted-Date: 14 Nov 2004 22:36:27 EST

Chris asks:
> I want to write a recursive descent parser, and wonder if there is a
> LL(1) grammar for "c-style" variable assignement
> (i.e. "id=id=id+id" but not "id=id=id+id=2").
...
> But then I lose the context. I don't no if the "id" in rule one is a
> left value or not. Does an LL(1) grammar exist for this problem? Or
> am I absolutely wrong with something?


A little analysis will answer your question.


1) The RHS of an assignment can be a single ID, via A->E->T->ID
2) It can also be an assignment, A->ID = A


Therefore, to distinguish those two cases, you need at least two
tokens ID = (vs ID + or ID EOF). So, when you factor the ID out of
the common prefix for those possibilities, you lose the distinction of
how ID is used. You need the two tokens to see ID in context.


Thus, your second grammar captures the fact that ID cannot be used in
the non-desired way. That is, your second grammar does not allow
ID+ID=ID. However, it does not build the types of AST trees that
allow you to conveniently process the input.


The problem gets worse if you try to allow an even richer C subset,
e.g. *ID = 2 or ID[I+1] = ID[I]. In general, one needs more unbounded
look-ahead (i.e. to the end of the expression) to distinguish all the
cases. This means one needs an LR rather than LL grammar.


This is an example of a typical class of problems (typed expressions)
where grammars are not a good solution. I know of two other problems
that are similar in nature. The first is distinguishing boolean
vs. integer expressions when booleans can "convert" to integers.
The second is dealing with NULL rows as special in SQL expressions.


In both cases, there are grammatic solutions to the problem. That is,
there are grammars that allow only "legal" programs to be accepted and
enforce the desired "type" rules. However, in both cases, the grammar
solution is horrendous (in that large portions of the grammar are
duplicated with slight permutations, resulting in a maintenance
nightmare). It might be possible to do a clean grammatic solution to
"type" problems with 2-level aka VW grammars. However, your typical
CFG (LL/LR grammar) is not likely to cut it.


A better solution to these kind of problems is a 2nd pass (or phase).
Build the parse tree with a "natural" grammar that doesn't try to
enforce the "type" rules and allows syntactically correct but
semantically wrong inputs. Then, apply the semantics to the
tree. Look for left-subtrees of assignments that are disallowed
expressions. Propragate the boolean/integer null/non-null-ness up the
tree to places where it needs checking. Attribute grammars do this
relatively well, and one can hand-implement most attribute traversals
by hand quite easily if one doesn't have an attribute evaluator
generator.


Hope this helps,
-Chris


*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)
------------------------------------------------------------------------------



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.