Mon, 10 Feb 92 16:06:18 +0100

Related articles |
---|

grammar rule rewriting question jos@and.nl (Jos Horsmeier) (1992-02-04) |

LL(1) translation problem jan@si.hhs.nl (1992-02-10) |

Newsgroups: | comp.compilers |

From: | jan@si.hhs.nl (Schramp) |

Keywords: | LL(1), parse, question |

Organisation: | Sector Informatica, Haagse Hogeschool, The Hague, The Netherlands |

Organization: | Compilers Central |

References: | 92-02-017 |

Date: | Mon, 10 Feb 92 16:06:18 +0100 |

jos@and.nl (Jos Horsmeier) writes:

*>Bluntly stated: I have a problem. I'm currently working on a grammar rule*

*>rewriting program. One feeds it with a description of a CF grammar and it*

*>tries to rewrite the rules into an LL(1) form. This means:*

*>1: Removal of all epsilon rules*

*>2: Removal of all (indirect) left recursion*

*>3: Left factoring of all the necessary rules.*

*>So far, life is all quite simple. But, and here's my problem: some of the*

*>terminal symbols are, what I call `static semantic' symbols. These symbols*

*>contain snippets of actual programming language statements. Still not much*

*>of a problem, but these statements contain references to other (non)*

*>terminal elements of the grammar rules.*

*> ...*

*>There's more to it: if an epsilon rule contains a semantic token, I cannot*

*>remove that rule (I would simply lose the semantic actions then.) But if I*

*>can't remove the rule, I cannot rewrite the left recursive rules! On the*

*>other hand, if I *do* remove such an epsilon rule and transfer the*

*>semantics to all the other rules which have the LHS of this rule somewhere*

*>in their RHS (read this again ...), I might end up with semantic tokens,*

*>completely ripped out their context. Ok, enough whining ...*

With respect to your question some remarks:

- Removal of all epsilon-rules is not neccesary. I suppose your wish to

remove epsilon rules derives from the restriction that the algorithm

metioned in the Dragon-book only works for epsilon-free grammars.

Consider that just a simplification of the problem.

To see the problem look at the next grammar fragment:

A -> A B

A -> C

The standard solution after removing left recursion is:

A -> C A'

A' -> B A' | epsilon

The problem is that B might derive epsilon, in which case you're stuck

with left recusion of the form A' -> A'.

So you have to make sure B doesn't derive epsilon!

Now what if B does indeed derive epsilon?

Then you substitute B by all of its right sides. Of course this raises the

number of A-rules. Then reconsider all left recursive right sides of

A-rules and repeat the process of substituting the nonterminal symbol to

the right of A in case this whole (!) string derives epsilon.

In the end you're stuck with some offending rules of the form:

A -> A

and/or

A -> A {any of your actions}.

Rules of the form A -> A can be left out without harm, they contribute

nothing to the language. Rules of the form A -> A {action} define an

infinite repetition of {action} and require a better grammar, throw them

away too.

Blindly removing all epsilon-productions however is of no use as the

standard solution introduces new ones.

- A simpler solution to your problem might be to use a "translation

grammar" to define translation to a kind of postfix code, to build an

abstract syntax tree from the postfix code and to hook on actions to the

nodes of the tree. I'am presently working on that kind of approach.

To be more specific:

A translation grammar is a CFG with two sets of terminal symbols: input

terminals and output terminals. As such it defines some intermediate

language with sentences built from both sets of terminals. Leaving the

output terminals out of a sentence gives a sentence of the input language,

leaving the input terminals out of a sentence defines the corresponding

output sentence. This way the translation grammar defines a language to

language translation.

A translation grammar can be modified by all transformations that leave

the intermediate language invariant. These include substitution, removal

of left recursion, and factorization among others.

A table-controled LL(1) parser needs only a minor modification to

translate according to a tranlation grammar: When a output terminal

appears on the top of the stack, then pop it and send it to the output.

As an output language I prefer an extended form of binary abstract syntax

tree. It should not just be used for expressions, but also for other

constructs like lists and structured statements.

An AST (abstract syntax tree) can be constructed with a variant of the

commonly known stack-evaluator for expressions. This variant uses a stack

of AST's instead of a stack of values. It's action is mainly:

- given any operator, (don't take this to strictly)

- pop two AST's from the stack, (binary AST's!)

- create a single tree from the operator with the two trees as children,

- push this AST on the stack.

Define actions as recursive evaluators of (parts of) the AST. This leaves

room for inherited and synthesized attributes within the AST.

What we win with this approach: Translation becomes completely independent

of the messed up LL(1) grammar you obtain by automated transformation.

The AST you get, can reflect associativity and precedence of operators in

a natural way.

Current state of affairs:

- I've completed a working system acceptiong regular right-part

translation grammar as input.

- I'm currently working on a version that models the right-parts of the

productions of each nonterminal in a finit-state-acceptor fashion. That

way factorizing becomes equivalent to making it deterministic and

minimizing the number of states. That way elimination of left and right

recursion can be obtained by leaving out the left or right recursive

transitions, replacing them by an epsilon-transition. For instance an

epsilon from the final state of the fsa to the end-point of the offending

left recursive transition. Likewise a rightrecursive transition can be

replaced by an epsilon from its startstate to the startstate of the fsa.

Next you have to make it deterministic and to minimize the number of

states. Other issues: elimination of equivalent fsa's in such a scheme.

I'm not sure whether this is to concise for you readers around the world.

You may mail me with your questions.

Jan Schramp.

--

Jan Schramp, Haagse Hogeschool - Sector Informatica. Louis Couperusplein 19,

2514 HP Den Haag, Netherlands. E-mail: jan@si.hhs.nl

--

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.