Re: table compression

Heng Yuan <heng@Ag.arizona.edu>
8 Nov 2001 01:07:49 -0500

          From comp.compilers

Related articles
table compression rboland@unb.ca (Ralph Boland) (2001-11-04)
Re: table compression Olivier.Ridoux@irisa.fr (Olivier Ridoux) (2001-11-08)
Re: table compression hannah@schlund.de (2001-11-08)
Re: table compression heng@Ag.arizona.edu (Heng Yuan) (2001-11-08)
Re: table compression mickunas@cs.uiuc.edu (Dennis Mickunas) (2001-11-08)
Table compression leonardo@dcc.ufmg.br (Leonardo Teixeira Passos) (2005-09-27)
Re: Table compression anton@mips.complang.tuwien.ac.at (2005-09-30)
Re: Table compression hannah@schlund.de (2005-09-30)
Re: Table compression cleos@nb.sympatico.ca (Cleo Saulnier) (2005-09-30)
Re: Table compression Peter_Flass@Yahoo.com (Peter Flass) (2005-10-02)
[3 later articles]
| List of all articles for this month |

From: Heng Yuan <heng@Ag.arizona.edu>
Newsgroups: comp.compilers
Date: 8 Nov 2001 01:07:49 -0500
Organization: The University of Arizona
References: 01-11-006
Keywords: parse
Posted-Date: 08 Nov 2001 01:07:46 EST

I happened to look into the table compression when I was started writing
YooLex (a Flex-like tool, yoolex.sourceforge.net). Lex and parser tables
are about the same, so I will use Flex as the example.


Flex implements a compression algorithm that was described in
the following paper:
P.Dencker, et al,
"Optimization of parser tables for portable compilers,"
ACM Transaction on Programming Languages and Systems, 6(4):546-572,
October 1984.
http://citeseer.nj.nec.com/context/27540/0


There are three components to it.


1. Equivalent class. This is used to reduce the character set. This is
more useful compression Lex tables due to large character set size. EC is
also used to in error states transitions to find identical columns.
Dunno about the LR tables.


2. Block fitting. Basically, find the most common state in a row. If it
is repeated quite frequently, then the state is compressable. Then one
finds the block size and holes. For example, a row containing
5 5 5 4 4 5 3 4 5 5 5
would have the repeated state 5, block size of 5 (which is 4 4 5 3 4),
and a hole of size 1 at position 6. If you go through all the states,
then you could try to fit them together. Note, that is NP-complete.
So, one could use some simpler approach to it. Flex, if I interpreted
the code correctly, basically layout all the states containing holes
first tightly packed one next to the other, then fill the holes with
blocks that do not contain holes. This approach generally works well
since there will be much more fillers than the holes (YooLex uses a
variant trying to fit them better, but the speed is slower).


However, if you just implement the above, the resulting table is only 40
or 30% of the original Equivalent Class table. So there are additional
things to take care of.


3. If you look at the Equivalent Class table (for Flex, use -Ce -Cf to
generate the table), you will notice that there are lots of negative
values (the values themselves are not important; the negative sign is.
In YooLex, I just give them 0 instead of negative #). Usually, these
negative values tend to occur on one side of the table a lot and thus
create very big holes. This is not good since the block fitting approach
above does not like holes. Also, there tends to be a lot of error states
in the table. If one could somehow remove the error states (assuming that
they are just the same as the next most repeated value in the table), one
could suddently compress the table >>10 times (depending the # of eqvalent
classes and # of error states). The problem is how to compress the error
matrix. The paper listed above used the following stepwise approach:
a. negative the matrix (since error matrix is not sparse, then
the negative of it must be).
b. compress the similar rows and columns (i.e. if there are
several rows and columns having the same value, just store
one copy and use some other scheme to index them. Flex uses
yy_meta to index the columns, the row eqvalent indexing is not
necesssary if tricks are used).
c. instead of storing the error matrix table as binary numbers
(since the matrix is binary), we could use the same procedure
in step2 to compress it. (Using the binary number don't
necessarily buy us some space, and it has some other things
must be taken cared of).


YooLex currently has the first 2 steps implemented and I am still writing
the code for the third part. So until I get it fully implemented, my
intepretation could be false. Go read the paper first.


Good Luck.


Heng Yuan


On 4 Nov 2001, Ralph Boland wrote:


> I've been searching the literature for methods of compressing LR(1)
> (or LALR(1)) parse tables but I haven't found very much.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.