Re: Is the relation beetween grammar.y and yy.tab.c bijective ?

Chris F Clark <cfc@world.std.com>
3 Apr 2000 11:31:40 -0400

          From comp.compilers

Related articles
Is the relation beetween grammar.y and yy.tab.c bijective ? nratier@jsbach.univ-fcomte.fr (Nicolas Ratier) (2000-03-23)
Re: Is the relation beetween grammar.y and yy.tab.c bijective ? joachim.durchholz@halstenbach.com.or.de (Joachim Durchholz) (2000-03-28)
Re: Is the relation beetween grammar.y and yy.tab.c bijective ? cfc@world.std.com (Chris F Clark) (2000-04-03)
| List of all articles for this month |
From: Chris F Clark <cfc@world.std.com>
Newsgroups: comp.compilers
Date: 3 Apr 2000 11:31:40 -0400
Organization: The World Public Access UNIX, Brookline, MA
References: 00-03-103 00-03-152
Keywords: yacc

Nicolas Ratier <nratier@jsbach.univ-fcomte.fr> asked:
> Is there a way to "reverse" the C program genereted by Yacc (bison).
> i.e. get the Yacc original grammar. Corrolary : Does the relation
> beetween Yacc and yy.tab.c is bijective ?


Yes, one should be able to retrieve the original grammar from the
[LA]LR tables generated by a tool in the yacc family. The tables
merely encode the transitions between a dfa that matches the rules in
parallel. Here is the basic scheme for yacc[-like] tables.


The first step would be to untangle the tables into a linear array
form, where you can tell which array entries go with which states.


Second, draw (or create in memory if you are running this as an
algorithm) the graph that connects the states. Each shift action is a
directed edge from the state that it is in to the state it mentions.


Third, find the states with reduce actions, these are the tail ends of
rules. The reduce action should tell you the name of the non-terminal
being reduced and the length of the rule that the reduce terminates.


Fourth, walk back from the reduce states by the length of the rule for
each reduce action. This is effectively undoing the shifts. Note the
token that is shifted each step that is walked back and prepend that
to the rule you are building up for that non-terminal. Note that
sometimes you will have more than one shift entering a state. When
that happens follow all of the shifts entering the state
simultaneously. Generally this happens because a non-terminal is used
in two contexts. Both shifts will then be shifts of the same tokens,
but the start states will be different uses of that non-terminal.


Oops, I forgot to undo the goto transitions in this process. A goto
action is essentially a shift on a non-terminal. In your walking back
of the shifts, if you encounter a state that is the target of a goto
action, the goto action represents the use of the non-terminal the
goto is associated with at that point in the rule. (excuse my
vagueness on this point, but Yacc++ doesn't use goto actions--it
simply shifts non-terminals just like they were tokens--it makes
certain things simpler and more regular). When you get done walking
back the rule that the goto action is associated you are at the state
where the non-terminal for the goto action starts and you can continue
walking back the shifts of tokens (or the goto action of another
non-terminal).


When you have walked back all the rules, you have recreated your
grammar. I sometimes use a variation on the above technique when
debugging a grammar, not to recreate the grammar from the tables but
to determine which (set of) start state(s) a particular conflict state
comes from and the context those start states imply. If your parser
generate outputs a human readable version of the states, you can use
that to see the connection as you apply the above technique. (We
actually specifically devised the output of Yacc++ to make it easy for
us to do that.)


Now, all that said, there are some caveats.


First, the tables being unravelled must truly be generated by a parser
generator for the technique to work. If you filled an array with a
bunch of random actions and started this unravelling process on it, it
would fail in most cases. There are certain regularities in all
generated tables that the algorithm exploits. (In other words, the
mapping is only bijective on those subset of tables that are parser
generated and the function is not "onto" the entire set of arrays
filled with parsing actions.)


Second, it is possible that some optimizations may map two distinct
grammars to the same tables. In that case, both grammars recognize
the same language. The only one(s) that comes to mind as a candidate
for doing such are chain-rule and unit-production removal--I can't
recall if those are separate optimizations or two names for the same
thing--they are certainly related. (In other words the mapping is
only one-to-one on "minimal" grammars that define a language.)


Hope this helps,
-Chris


*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
3 Proctor Street voice : (508) 435-5016
Hopkinton, MA 01748 USA fax : (508) 435-4847 (24 hours)


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.