grammar rule rewriting question

Jos Horsmeier <jos@and.nl>
Tue, 4 Feb 92 14:38:20 +0100

          From comp.compilers

Related articles
grammar rule rewriting question jos@and.nl (Jos Horsmeier) (1992-02-04)
LL(1) translation problem jan@si.hhs.nl (1992-02-10)
| List of all articles for this month |
Newsgroups: comp.compilers
From: Jos Horsmeier <jos@and.nl>
Keywords: LL(1), parse, question
Organization: Compilers Central
Date: Tue, 4 Feb 92 14:38:20 +0100

Greetings all,


Bluntly stated: I have a problem. I'm currently working on a grammar rule
rewriting program. One feeds it with a description of a CF grammar and it
tries to rewrite the rules into an LL(1) form. This means:


1: Removal of all epsilon rules
2: Removal of all (indirect) left recursion
3: Left factoring of all the necessary rules.


So far, life is all quite simple. But, and here's my problem: some of the
terminal symbols are, what I call `static semantic' symbols. These symbols
contain snippets of actual programming language statements. Still not much
of a problem, but these statements contain references to other (non)
terminal elements of the grammar rules. Here's a (simple) example in some
sort of `yacc-like' format:


Number : Number digit { $$ = 10*$0 + valof($1); }
| digit { $$ = valof($0); }
;


where $0 in the first rule, refers to the non terminal `Number' in the RHS
of the first rule; $1 refers to the terminal `digit' in the RHS etc.


Now, after faithfully following the rules, given to us by the Dragon Book,
I remove left recursion like this:


Number : digit { $$= valof($0); } Number'


Number' : digit { $$ = 10*$0 + valof($1); } Number'
| /* epsilon */
;


The semantics in the second rule are wrong! They have to be changed into:


Number' : digit { $$ = 10*$-1 + valof($0); } Number'
| /* epsilon */
;


In this simple example, the change can easily be done automagically,
simply check the change of the position of the semantic token in the rule,
and add or subtract the difference to/from the indexes in that token. But
there's more to it: if an epsilon rule contains a semantic token, I cannot
remove that rule (I would simply lose the semantic actions then.) But if I
can't remove the rule, I cannot rewrite the left recursive rules! On the
other hand, if I *do* remove such an epsilon rule and transfer the
semantics to all the other rules which have the LHS of this rule somewhere
in their RHS (read this again ...), I might end up with semantic tokens,
completely ripped out their context. Ok, enough whining ...


Has anyone else ever worked on this problem: automatic rewriting grammar
rules, including static semantics? (or call it `two level' grammars if you
prefer)


Any pointer, reference, anything relevant would be highly appreciated.


thanks in advance and thanks for reading this.
and kind regards to you all,


Jos aka jos@and.nl


AND Software bv
Westersingel 108
3015 LD Rotterdam
The Netherlands
Phone: +31 (10) 436 7100 Fax: +31 (10) 436 7110
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.