28 May 2005 14:08:07 -0400

Related articles |
---|

CFGs vs. "declare variable before use" devriese@cs.tcd.ie (Edsko de Vries) (2005-05-26) |

Re: CFGs vs. "declare variable before use" wyrmwif@tsoft.org (SM Ryan) (2005-05-28) |

Re: CFGs vs. "declare variable before use" mefrill@yandex.ru (mefrill) (2005-05-28) |

Re: CFGs vs. "declare variable before use" torbenm@diku.dk (2005-05-28) |

Re: CFGs vs. "declare variable before use" cfc@shell01.TheWorld.com (Chris F Clark) (2005-05-28) |

RE: CFGs vs. "declare variable before use" quinn-j@shaw.ca (Quinn Tyler Jackson) (2005-05-28) |

Re: CFGs vs. "declare variable before use" torbenm@diku.dk (2005-05-31) |

Re: CFGs vs. "declare variable before use" gah@ugcs.caltech.edu (glen herrmannsfeldt) (2005-05-31) |

Re: CFGs vs. "declare variable before use" devriese@cs.tcd.ie (Edsko de Vries) (2005-06-02) |

Re: CFGs vs. "declare variable before use" cfc@shell01.TheWorld.com (Chris F Clark) (2005-06-02) |

Re: CFGs vs. "declare variable before use" gah@ugcs.caltech.edu (glen herrmannsfeldt) (2005-06-04) |

[3 later articles] |

From: | Chris F Clark <cfc@shell01.TheWorld.com> |

Newsgroups: | comp.compilers |

Date: | 28 May 2005 14:08:07 -0400 |

Organization: | The World Public Access UNIX, Brookline, MA |

References: | 05-05-216 |

Keywords: | parse, theory |

Esko asked:

*> A common statement I read about the limitations of CFGs is that they*

*> cannot be used to express the requirement that "variables must be*

*> declared before they are used". However, I have been unable to find*

*> any formal justifications for this statement (e.g., in the style of*

*> the proof using the pumping lemma that a^n b^n c^n is not a context*

*> free language). Could anyone point me to some relevant literature?*

*> Also, are people aware of any other limitations of CFGs, esp. in the*

*> context of (the semantics of) programming languages, such as the one I*

*> mentioned, preferably with proofs?*

I probably should let Quinn Tyler Jackson answer this as he's been

working fairly extensively in this area, that is defining grammars for

solving these kinds of problems. However, I'll give you a terse

answer, which should allow you to construct your own proof.

As a preamble, let's consider regular languages. The common statement

is that "regular languages cannot count". In particular, what regular

languages cannot do is count the number of matching parenthesis (or

similar constructs). That requires a stack. The Dyck languages solve

that problem. What regular languages can do is fixed sequences and

unbounded repretitions of a sequence.

One definition of CFG grammars is that they are the intersection of

regular and Dyck languages. That is you can do any fixed sequence,

any repetition, and they can balance parens. They can't do anything

else.

Now, why can't they do definition before use? Well, because

definition use sets don't form nicely matching balanced parens. You

can have two definitions and then match their uses in any order. So,

if we represent one variable with parens and the other with brackets,

with definitions being an open and uses being a close, we get strings

that look like:

()[] and ([]) which are nicely matched, but also

([)] and ([]]]]))]]] which aren't nicely matched at all

Note, that if we had a fixed number of variables (like we have a fixed

number of paren styles) one could write out the appropriate regular

expressions. Try it yourself, you can write a regular expression that

matches the two variable case (and it only gets slightly more complex

for the three varaible case). However, the problem only gets

interesting when we have an unbounded number of variables, so that we

need to "generate" new regular expressions on the fly. CFGs can only

add new regular expressions that represent balanced parens (that's all

a stack allows). Thus, they cannot solve the problem.

BTW, there are good languages for solving the def-before-use

problem. One of my favorites is called something like "register

vectors" and was written up in CACM probably 10 years ago. The idea

is that you have flags based upon left context and you can set a flag

when you see something and check a flag when you wish to know if

something is true. The flags are separate from the stack, so they

don't have to be balanced (like parens). They work a lot like

changing states in a regular expression, except that flags are

independent so you don't get combinatorial explosion. Thus, they are a

great way to capture left context sensitivity (as one needs for

def-before-use restrictions).

Now, one can prove that these flags can be modelled as attributes.

Thus, one can solve the def-before-use problem with an attribute

grammar. Attribute grammars can be modelled with two-level grammars,

so you can solve def-before-use with them too. You can also solve

def-before-use with indexed grammars if I recall correctly.

Ok, so maybe the answer wasn't so terse. It was still only a sketch.

Hope this helps,

-Chris

*****************************************************************************

Chris Clark Internet : compres@world.std.com

Compiler Resources, Inc. Web Site : http://world.std.com/~compres

23 Bailey Rd voice : (508) 435-5016

Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)

------------------------------------------------------------------------------

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.