Re: Facts about the Java class file format

Tim Harris <tlh20@cam.ac.uk>
21 Oct 1998 01:38:33 -0400

          From comp.compilers

Related articles
Facts about the Java class file format pilz@ifi.unizh.ch (Markus Pilz) (1998-10-17)
Re: Facts about the Java class file format tlh20@cam.ac.uk (Tim Harris) (1998-10-21)
Re: Facts about the Java class file format jgm@CS.Cornell.EDU (Greg Morrisett) (1998-10-24)
Re: Facts about the Java class file format monnier+comp/compilers/news/@tequila.cs.yale.edu (Stefan Monnier) (1998-10-30)
Re: Facts about the Java class file format Jan.Vitek@cui.unige.ch (1998-10-30)
Re: Facts about the Java class file format pilz@ifi.unizh.ch (1998-10-30)
Re: Facts about the Java class file format albaugh@agames.com (1998-11-01)
Re: PowerPC CodePack (Was: Facts about the Java class file format) zalman@netcom.com (1998-11-06)
[1 later articles]
| List of all articles for this month |

From: Tim Harris <tlh20@cam.ac.uk>
Newsgroups: comp.compilers
Date: 21 Oct 1998 01:38:33 -0400
Organization: University of Cambridge Computer Laboratory
References: 98-10-108
Keywords: Java, comment

Markus Pilz <pilz@ifi.unizh.ch> wrote:
> o The theoretical minimum average number of bits needed to encode the
> opcode is 4 bits instead of the 8 or 16 used today.


If my understanding of section 4.3 of your report is correct then this
figure has been calculated from the mean, across the 4016 classes
which you studied, of the mean number of bits of information in each
opcode. This is 3.46 bits, so if I recall it correctly, a corollory
of the source coding theorem says that a prefix code can be
constructed with binary code words of (at most) mean length 4.


However, I am not clear about how useful this result is in practice
since (assuming I understand your report correctly) the mean length of
4 would be achieved by tailoring the encoding to each class file,
rather than being a fixed encoding which is used for all class files.
Given that many classes have been observed to be small and do not
contain many bytecode operations, this seems to raise the problem of
how to distribute or describe the encoding used in a particular case.
Defining the encoding explicitly before the bytecode data would
mitigate the benefits of the more compact representation the it
provides. Similarly, the start-up effects of a scheme like LZ or the
compressed parse trees used with slim binaries may limit their
effectiveness to reach the theoretical minimum. I would be interested
to see how close a practical scheme can come!


A (clearly less thorough!) examination of about fifty class files
seemed to indicate that 6 bits could be required when the same
encoding was used for all of the files (presumably as a consequence of
larger total number of distinct operations seen and the fact that the
frequently used bytecodes differ somewhat between classes).


Something else that came to mind while reading the report is whether
there is much to be gained by analyzing the values which occur as
operands to bycode operations -- for example whether there is a useful
dominance of low values produced by aload/iload/etc.


tim
[It also occurs to me that small size is important when you're
transferring a Java app, but less important when you're running it.
Netscape ships their Java code in zip files, where it's typically
compressed by about 50%. How much better than that is anyone likely
to do and still have a format that's useful for execution? -John]


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.