Re: CFGs vs. "declare variable before use"

Chris F Clark <cfc@shell01.TheWorld.com>
28 May 2005 14:08:07 -0400

          From comp.compilers

Related articles
CFGs vs. "declare variable before use" devriese@cs.tcd.ie (Edsko de Vries) (2005-05-26)
Re: CFGs vs. "declare variable before use" wyrmwif@tsoft.org (SM Ryan) (2005-05-28)
Re: CFGs vs. "declare variable before use" mefrill@yandex.ru (mefrill) (2005-05-28)
Re: CFGs vs. "declare variable before use" torbenm@diku.dk (2005-05-28)
Re: CFGs vs. "declare variable before use" cfc@shell01.TheWorld.com (Chris F Clark) (2005-05-28)
RE: CFGs vs. "declare variable before use" quinn-j@shaw.ca (Quinn Tyler Jackson) (2005-05-28)
Re: CFGs vs. "declare variable before use" torbenm@diku.dk (2005-05-31)
Re: CFGs vs. "declare variable before use" gah@ugcs.caltech.edu (glen herrmannsfeldt) (2005-05-31)
Re: CFGs vs. "declare variable before use" devriese@cs.tcd.ie (Edsko de Vries) (2005-06-02)
Re: CFGs vs. "declare variable before use" cfc@shell01.TheWorld.com (Chris F Clark) (2005-06-02)
Re: CFGs vs. "declare variable before use" gah@ugcs.caltech.edu (glen herrmannsfeldt) (2005-06-04)
[3 later articles]
| List of all articles for this month |

From: Chris F Clark <cfc@shell01.TheWorld.com>
Newsgroups: comp.compilers
Date: 28 May 2005 14:08:07 -0400
Organization: The World Public Access UNIX, Brookline, MA
References: 05-05-216
Keywords: parse, theory
Posted-Date: 28 May 2005 14:08:07 EDT

Esko asked:
> A common statement I read about the limitations of CFGs is that they
> cannot be used to express the requirement that "variables must be
> declared before they are used". However, I have been unable to find
> any formal justifications for this statement (e.g., in the style of
> the proof using the pumping lemma that a^n b^n c^n is not a context
> free language). Could anyone point me to some relevant literature?
> Also, are people aware of any other limitations of CFGs, esp. in the
> context of (the semantics of) programming languages, such as the one I
> mentioned, preferably with proofs?


I probably should let Quinn Tyler Jackson answer this as he's been
working fairly extensively in this area, that is defining grammars for
solving these kinds of problems. However, I'll give you a terse
answer, which should allow you to construct your own proof.


As a preamble, let's consider regular languages. The common statement
is that "regular languages cannot count". In particular, what regular
languages cannot do is count the number of matching parenthesis (or
similar constructs). That requires a stack. The Dyck languages solve
that problem. What regular languages can do is fixed sequences and
unbounded repretitions of a sequence.


One definition of CFG grammars is that they are the intersection of
regular and Dyck languages. That is you can do any fixed sequence,
any repetition, and they can balance parens. They can't do anything
else.


Now, why can't they do definition before use? Well, because
definition use sets don't form nicely matching balanced parens. You
can have two definitions and then match their uses in any order. So,
if we represent one variable with parens and the other with brackets,
with definitions being an open and uses being a close, we get strings
that look like:


  ()[] and ([]) which are nicely matched, but also
  ([)] and ([]]]]))]]] which aren't nicely matched at all


Note, that if we had a fixed number of variables (like we have a fixed
number of paren styles) one could write out the appropriate regular
expressions. Try it yourself, you can write a regular expression that
matches the two variable case (and it only gets slightly more complex
for the three varaible case). However, the problem only gets
interesting when we have an unbounded number of variables, so that we
need to "generate" new regular expressions on the fly. CFGs can only
add new regular expressions that represent balanced parens (that's all
a stack allows). Thus, they cannot solve the problem.


BTW, there are good languages for solving the def-before-use
problem. One of my favorites is called something like "register
vectors" and was written up in CACM probably 10 years ago. The idea
is that you have flags based upon left context and you can set a flag
when you see something and check a flag when you wish to know if
something is true. The flags are separate from the stack, so they
don't have to be balanced (like parens). They work a lot like
changing states in a regular expression, except that flags are
independent so you don't get combinatorial explosion. Thus, they are a
great way to capture left context sensitivity (as one needs for
def-before-use restrictions).


Now, one can prove that these flags can be modelled as attributes.
Thus, one can solve the def-before-use problem with an attribute
grammar. Attribute grammars can be modelled with two-level grammars,
so you can solve def-before-use with them too. You can also solve
def-before-use with indexed grammars if I recall correctly.


Ok, so maybe the answer wasn't so terse. It was still only a sketch.


Hope this helps,
-Chris


*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)
------------------------------------------------------------------------------


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.