Re: Table compression

Cleo Saulnier <cleos@nb.sympatico.ca>
30 Sep 2005 02:02:52 -0400

          From comp.compilers

Related articles
[2 earlier articles]
Re: table compression hannah@schlund.de (2001-11-08)
Re: table compression heng@Ag.arizona.edu (Heng Yuan) (2001-11-08)
Re: table compression mickunas@cs.uiuc.edu (Dennis Mickunas) (2001-11-08)
Table compression leonardo@dcc.ufmg.br (Leonardo Teixeira Passos) (2005-09-27)
Re: Table compression anton@mips.complang.tuwien.ac.at (2005-09-30)
Re: Table compression hannah@schlund.de (2005-09-30)
Re: Table compression cleos@nb.sympatico.ca (Cleo Saulnier) (2005-09-30)
Re: Table compression Peter_Flass@Yahoo.com (Peter Flass) (2005-10-02)
Re: Table compression paul@parsetec.com (Paul Mann) (2005-10-02)
Re: Table compression cfc@shell01.TheWorld.com (Chris F Clark) (2005-10-03)
Re: Table compression henry@spsystems.net (2005-10-13)
| List of all articles for this month |

From: Cleo Saulnier <cleos@nb.sympatico.ca>
Newsgroups: comp.compilers
Date: 30 Sep 2005 02:02:52 -0400
Organization: Aliant Internet
References: 05-09-130
Keywords: parse, practice
Posted-Date: 30 Sep 2005 02:02:52 EDT

Leonardo Teixeira Passos wrote:
> I'm looking foward to developing a study on table compression techniques.
>
> When browsing the Web, I haven't found any recent articles (at least from
> the last decade).
>
> Does anyone have a clue as where as to begin?


I'll only speak about LR tables right now.


First, split your table into two parts. One part for the reduce/shift
actions, and the other for the goto table.


Now for the reduce/shift table. Take all the shift actions and put them
in a list of {token, shift_action} pairs. Only store values where there
is an entry in the table. Now continue creating pairs with the reduce
actions until there is only 1 reduce action left (if any). Make sure
that the reduce action that's left is the one that has the most entries
for that state. This reduce action that remains will be the default
reduce action. If no tokens match in the list just created, then reduce
with the default reduce action. This means that error detection is
deferred until you get in a state that doesn't have a default reduction.


Also, you can now compare the list of {token, shift/reduce_actions} in
each state to see if they are identical. If they are, you can use the
same list for both states.


Here is an example from another post. Hope the original poster doesn't
mind.


tubylerczyk wrote:


  > I have sample from "Compilers constructions" by Waite and Goos
  > In chapter 7 is grammar LALR(1) but not SLR(1):
  > (1) Z -> A
  > (2) A -> aBb
  > (3) A -> adc
  > (4) A -> bBc
  > (5) A -> bdd
  > (6) B -> d
  >
  > In book table is:
  > a b c d # A B
  > -------------------------------
  > 0 2 3 . . . 1
  > 1 . . . . *
  > 2 . . . 5 . 4
  > 3 . . . 7 . 6
  > 4 . 8 . . .
  > 5 . +6 9 . .
  > 6 . . 10 . .
  > 7 . . +6 11 .
  > 8 . . . . +2
  > 9 . . . . +3
  > 10 . . . . +4
  > 11 . . . . +5
  >
  > number means Shift and Goto, +number means reduction
  >


I modified the table to be the actual parse table and not what's in the
book.


Notice that the table takes up (5+2)*(11) = 77 entries.


If we create our lists for each state as described above, we get:
state 0 action: {a,2}, {b,3} goto: {A,1}
state 1 action: {#,acc}
state 2 action: {d,5} goto: {B,4}
state 3 action: {d,7} goto: {B,6}
state 4 action: {b,8}
state 5 action: {c,9} default reduction: +6
state 6 action: {c,10}
state 7 action: {d,11} default reduction: +6
state 8 default reduction: +2
state 9 default reduction: +3
state 10 default reduction: +4
state 11 default reduction: +5


Each state would contain a pointer to the action/reduce list, a pointer
to the goto list and the default reduction. This makes 3*11 = 33 data
entries. Here, there are no duplicate action lists, but if there were
(as is often the case) we would only have one copy of that list and have
the pointers in the states point to this one list.


And from the above, there are 6+2+4+4+2+2+2+2=24 entries in the actual
lists. We did not count default reductions as they were counted above.


33+24 = 57 entries vs 77 entries = 26% table size reduction.
For the list, you sometimes need an end marker. This can take up
another 8 to 11 entries.


As you can see for small tables, the benefit isn't much. But for larger
tables, it really helps. If you notice, most states don't accept many
tokens. However, the more tokens your parser has overall, the more
table compression will have an effect.


Here's are stats from a simple 4 op calc LR(1) grammar.


terminals: 10
production names: 5
productions: 10
states: 48
uncompacted table size: 720
final table size: 304


304 data entries and this include all the data structures and pointers.
Each state has 5 entries (1 for tokens ptr, 1 for actions ptr, 1 for
gotos ptr, 1 for size of tokens list, 1 for default reduction). 5*42 =
210 (6 states were merged). The remaining 94 entries are actual data.
This is why SLR or LALR is often used so that the number of states can
be reduced. Hmmm... actually, I should put the size of the tokens list
at the front of the list itself instead of duplicating it. That would
mean 10-42 = -32. 32 reduction in size.


This table is 42% of the original size. After I add the above
modification, it'll be 38% the original size. For larger tables, I
often get 80% or more size reductions.


In the stats above, I went further and split the token list and the
action list. Often times, multiple states will act on the same set of
tokens, but have different actions. By having seperate pointers for the
token list and the action list, you can compress the table even further
by reusing token lists that are identical. The 304 table size above
uses this technique. The disadvantage is that you need an extra pointer
for every state.


If the tokens are sorted, you can even use binary search if there are
many tokens (although this should not be very frequent).


I may as well convert the table again to this new technique:


lists of tokens:
t0: {a,b}
t1: {#}
t2: {b}
t3: {c}
t4: {d}


lists of actions:
a0: {2,3}
a1: {acc}
a2: {5}
a3: {7}
a4: {8}
a5: {9}
a6: {10}
a7: {11}




states:
0: tokens: t0, action: a0, goto: {A,1}
1: tokens: t1, action: a1
2: tokens: t4, action: a2, goto: {B,4}
3: tokens: t4, action: a3, goto: {B,6}
4: tokens: t2, action: a4
5: tokens: t3, action: a5, def: +6
6: tokens: t3, action: a6
7: tokens: t4, action: a7, def: +6
8: def: +2
9: def: +3
10: def: +4
11: def: +5


11 states * 4 = 44
tokens has 6 entries
actions has 9 entries
goto has 6 entries
44+6+9+6 = 65 entries


Although this is more than previously and not much of a reduction in
size, it does work quite well for larger tables as was shown in the
other example I gave of a 58-62% reduction in table size using this very
technique.


In larger tables, the tokens list really does collapse and you will see
a lot of redundancy eliminated. In a small grammar such as this, you
don't see it as much. Only a few token lists were eliminated and then
there was only 1 token in the list to begin with. In larger tables, you
will see lists with many tokens merge into one list (as they are identical).


One last thing, do not try to compress the goto table as much as the
tokens/action table. Just keep it in a simple paired list. This is
because the goto table is sparse and you'd be wasting your time and
actually make the table larger otherwise.




Hope this helps and gives you some ideas,
Cle


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.