From: | haahr@netcom.com (Paul Haahr) |
Newsgroups: | comp.compilers |
Date: | 7 Feb 1997 23:31:08 -0500 |
Organization: | NETCOM On-line services |
References: | <01bbfca0$a284a6f0$041b6682@tecel> 97-01-120 97-01-139 97-01-225 97-02-016 |
Keywords: | architecture, Java |
Norman Ramsey <nr@adder.cs.virginia.edu> wrote:
> Within ten days we've heard that Java bytecodes are so much like
> modern machines that it's easy to generate machine code on the fly,
> and so much like the source code that source-level analyses are easy.
My categorization is that Java bytecodes are close enough to real
hardware that it's easy to generate poor-quality machine code quickly,
and close enough to source that a high-quality bytecode to native
compiler has to do approximately as much work to do as a native
compiler for a source language like Java, after parsing.
The question I would raise is ``What information is lost, what is
preserved, and what is added when compiling from Java source text to
JVM classfiles?''
Off the top of my head, what is lost is:
- comments
- line numbers and local variable names (though compilation for
debugging leaves these in)
- structured control flow
What is added is:
- assignment of stack and local variable indices to local variables,
which may merge variables (similar to register assignment for
conventional compilers)
- depth of stack and number of local variable slots are made concrete
- operators (+) are made specific (iadd)
- overloaded function calls are resolved to specific signatures
- class names are resolved to fully qualified names
Almost everything else is preserved from the source. (Future
compilers might change the information more by, say, doing more
aggressive common subexpression elimination or loop unrolling than
found in Sun's javac.)
Significantly, type information is completely preserved. I'd cite
this as the fundamental reason why Java decompilers are quite as
plentiful as they are and C decompilers are relatively uncommon
beasts.
Note that the replacement of Java's high-level control structures (if,
while, try, etc) with branches and exception tables probably makes
good compilation of bytecode a little more difficult than compilation
of source. That is, many optimizations are easier when dealing with
structured control flow. For example, Brandis & M\:ossenb\:ock's
method for generating the static single assignment form of structured
programs is much more pleasant than the classic Cytron, Ferrante, et
al, approach or Sreedhar & Gao's DJ-graph code.
So, despite bytecodes being ``closer'' to real machine code, source
may be easier for a high-quality compiler to work from. On the other
hand, techniques for structuring goto-based code are at least two
decades old, and still found in the scientific programming world.
Since there's a presumption that JVM code comes from a structured
language, trying those techniques would probably be profitable.
> My cursory look at the JVM spec reminded me an awful lot of Smalltalk,
The bytecodes may be, but the amount of type information present in
the class file is distinctly un-Smalltalkish.
> so I'm not ready to swallow either claim, but I would love to be
> convinced.
Since this is mostly an issue of definition and perspective, trying to
convince is mostly an issue of rhetoric and not all that interesting
technically.
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.