Re: Java compiler optimizations

David Chase <chase@naturalbridge.com>
19 Jan 2001 23:20:39 -0500

From comp.compilers

Related articles
Java compiler optimizations? suganya@cdacb.ernet.in (suganya) (2001-01-04)
Re: Java compiler optimizations? iank@idiom.com (2001-01-05)
Re: Java compiler optimizations? eliotm@pacbell.net (Eliot Miranda) (2001-01-09)
Re: Java compiler optimizations? iank@idiom.com (2001-01-11)
Re: Java compiler optimizations? eliotm@pacbell.net (Eliot Miranda) (2001-01-18)
*Re: Java compiler optimizations chase@naturalbridge.com (David Chase)* (2001-01-19)**
Re: Java compiler optimizations dimock@deas.harvard.edu (Allyn Dimock) (2001-01-26)

| List of all articles for this month |

From:	David Chase <chase@naturalbridge.com>
Newsgroups:	comp.compilers
Date:	19 Jan 2001 23:20:39 -0500
Organization:	Compilers Central
References:	01-01-012 01-01-018 01-01-050 01-01-066 01-01-087
Keywords:	Java, optimize
Posted-Date:	19 Jan 2001 23:20:39 EST

[ I hope this doesn't come off sounding too much like advertising,
      but the system I work on provides a useful counterexample to many
      sweeping claims about static and dynamic compilation, so it gets
      mentioned several times below, usually in a favorable way. ]

Eliot Miranda wrote:
  > ... If the function is applied to
  > the variable sufficiently often then one can improve performance by
  > compiling a special version of the code sequence which assumes the
  > variable is in fact a constant.

...

  > Since static optimizations can still be applied in a JIT context its
  > reasonable to assume that JIT optimization is more widely applicable.
  > That this is not the case currently merely reflects the immaturity of
  > the field, its lack of familiarity (not much taught, etc) and lack of
  > good architectural support (cheap user-level icache flushing, etc).

I have, perhaps, not been paying sufficiently close attention, but in
I have not seen measurements demonstrating that good dynamic
compilation is a clear win versus good static compilation. Dynamic
compilation is clearly better than interpretation, and it is
apparently better for languages like Smalltalk (where, for instance,
"+" can be sent to numbers of all sorts, whether small integer,
floating point, rational, or bignum) for which there are no static
compilers available for comparison. But, I have not seen anyone
kicking the tar out of static C++ compilers with a dynamic C++
compiler (and before someone claims that Java is not at all like C++,
please read to the bottom).

In theory, yes, there are interesting optimizations that can be
performed dynamically, but:

- cheap user-level icache flushing is often not available
      In particular, it is NOT an architectural feature that it
      be cheap, but an implementation feature; I believe that
      there are some instances of the Sparc chip that implement
      the cache flush instruction with a trap to the OS.

- at the same time, many modern chips include features designed
      to reduce the penalty of indirect branching (one thing that
      on-the-fly-compilation is intended to avoid) which reduces
      the net benefit of one of the more-frequently implemented
      JIT optimizations.

- static compilers seem to be able to perform a lot more inlining
      than dynamic compilation proponents give them credit for. In
      particular, keep in mind that type propagation can be used
      to determine bounds on the type of variables, and that those
      type bounds can in turn be used to devirtualize methods,
      which allows inlining, which allows more type propagation,
      etc.

- the very sort of things that JITs are supposed to optimize
      well (very common cases) are also treated well by modern
      hardware. A conditional branch that always falls one way
      or the other is not that expensive.

- JITs have to pay for measurement, compilation, and (in the
      case of conditioned optimization) condition testing. For
      example, a single indirect call might be replaced by a
      conditional test and a direct call. The two simpler
      branching operations might not be any cheaper than a
      single indirect branch; it depends very much on how the
      chip is put together.

- static compilers can, in a bit of a cheat, reap many of the
      benefits of dynamic compilation simply by running a measurement
      pass before compilation, identifying most of the special cases
      that JITs identify on the fly.

One thing I have not seen mentioned much is one of the
non-theoretical, practical downsides of JIT compilation; Just-In-Time
compiler bugs. Think, for a moment, about all the exciting
optimization and reoptimization that goes on -- how do you know that
it is all done correctly? JIT compiler writers are no better and no
worse than static compiler writers, and static compilers often (to be
charitable) contain bugs. I have found such bugs in every single
release of the Microsoft C++ compilers that I have used, known of bugs
in every single release of Sun's compilers that I have used or worked
on, seen long-latent bugs reveal themselves in the static compiler
that I am working on now, seen compiler bugs in every single release
of gcc that I have worked with, and seen compiler/VM bugs in every
single Java Virtual Machine that I have used, whether from Sun or from
any of our competitors.

In short, anyone who says that their compiler lacks bugs is making an
extraordinary claim. For static compilers, however, it is possible to
assert that the bugs sit still. They are there, or they are not.
They may require interesting inputs to tweak them, but they are at
least there. Simple code coverage is enough to ensure that the
always-exploding bugs are not present.

In contrast, the more a JIT compiler does the things that are said to
make dynamic compilation wonderful, the harder it is to be sure that
all possible bugs have been tested for and exhibited. Simple code
coverage analysis is hardly sufficient -- the old Symantec JIT
required at least 31 executions before it would compile code, and at
least one version of HotSpot would wait at least 10,000 executions, at
least on some machines. I know these numbers because I tested
measured them with a piece of code that contained a non-fatal bug (the
compiler would use the Intel FSIN instruction, contrary to the Java
specification -- this has since been fixed).

  > Its not clear that the original poster had a focus on static
  > optimization. Java, unlike C++, is a dynamic language.

Java is only a slightly dynamic language. Unlike Smalltalk and Lisp,
I can tell you exactly what values an "int" is permitted to have, and
how much storage it will need. With the exception of dynamic loading
(and there's a lot less truly dynamic loading than most JIT
aficionados claim), Java is only about as dynamic as Modula-3 or C++.

  > Agreed. Recent PLDIs have good papers on Java exceptions,
  > synchronized methods and thread-safety.

Java exceptions are easy. Java synchronization turns out to be
somewhat more interesting, but the best work I know of (ours, of
course :-) is unpublished, and will remain that way for some time.

  > The dynamic facilities of Java (class loading) and its reflective
  > facilities put it in a different class along dynamic languages like
  > Smalltalk.

NO. Unless Smalltalk is a lot more static than I think it is, Java's
dynamism is very much unlike Smalltalk's. Java's reflection is
completely possible in a statically compiled language -- we have
supported it in a statically-compiled Java system for years. It does
limit some of the optimizations that you could do in C++, but it
limits both static and dynamic compilers in the same way. (You'd like
to think that "private" meant something, but with the possibility of
reflective access, it really does not. Surprisingly many research
papers have been written about optimizations that are not the least
bit correct in the presence of reflective access to fields and
methods.)

Truly dynamic class loading in Java is much less necessary (that is to
say, a static compiler can fake most instances of "dynamic loading"
behind the curtain while providing identical apparent semantics) than
people seem to think it is. We are adding dynamic loading to our
system, but we have been able to run very many Java programs (and most
of the benchmarks, whoopee) for years without trouble. That says, to
me, that while people could write incredibly dynamic programs in Java,
most of them do not. I can think of one very good reason why not,
which is that the debugging facilities for Java are simply not up to
the task of dealing with truly dynamic code. (Anything that sits on
the disk before execution, where a debugger or static compiler can
look at it, is not truly dynamic.) I suppose this is a market
opportunity for someone :-).

  > There JITs provide a sound basis for optimization since
  > JITs can handle changes in the code while a program is running.

For Java, at least, the changes are only additive, except when
unreferenced classes are garbage-collected. This is an
extraordinarily important difference; incremental algorithms are
(generally) much easier if they do not have to deal with arbitrary
deletion and modification. GC-deletion is a special case, since the
purpose of the GC is to prove that deletion doesn't matter. (The GC
may well be somewhat in bed with the compiler, too, to ensure that
"doesn't matter" really means doesn't matter.)

Again, very important -- the (observable) changes are only additive.
Existing, in-use classes are never modified.

  > This is something C++ doesn't have to address.

I don't know that this is true either. People have been playing with
C++ and dynamically loaded compiled libraries for some time now. I
don't see how this is substantially different, at a semantic level,
from the sort of dynamic loading that Java does. Java has reflective
access to the code that gets loaded, but that's just a simple matter
of data structures.

So, to hit the high points, in the hierarchy of dynamic languages, I
think Java ranks pretty low.

- Java reflection is irrelevant to dynamism. It is not
      possible to manipulate (for example) blocks and closures
      as reflective objects.

- Referenced classes never change; new classes appear.

- Static type checking, and tightly-constrained primitive types,
      remove much of the input- and context-dependence of programs
      written in languages like Smalltalk.

- Control flow is very boring; it's almost all branches,
      integer-indexed indirect branches, and exception handling.

David Chase
chase@naturalbridge.com

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Java compiler optimizations

David Chase <chase@naturalbridge.com>19 Jan 2001 23:20:39 -0500

David Chase <chase@naturalbridge.com>
19 Jan 2001 23:20:39 -0500