Re: Bytecode an intermediate language?

rkrayhawk@aol.com (RKRayhawk)
27 Sep 2003 14:31:23 -0400

          From comp.compilers

Related articles
Bytecode an intermediate language? rajnichugh@yahoo.com (2003-09-14)
Re: Bytecode an intermediate language? rajnichugh@yahoo.com (2003-09-23)
Re: Bytecode an intermediate language? rkrayhawk@aol.com (2003-09-27)
Re: Bytecode an intermediate language? anton@mips.complang.tuwien.ac.at (2003-09-27)
| List of all articles for this month |

From: rkrayhawk@aol.com (RKRayhawk)
Newsgroups: comp.compilers
Date: 27 Sep 2003 14:31:23 -0400
Organization: AOL http://www.aol.com
References: 03-09-077
Keywords: interpreter, practice
Posted-Date: 27 Sep 2003 14:31:23 EDT

rajnichugh@yahoo.com (Rajni)


asks


<<
Actually I am having a different doubt. From what I can understand
from previous discussions on intermediate representations is that the
representation to be chosen depends on the optimization techniques to
be employed in compiler. But I can't get any concrete answer with
respect to different representations. Does the choice of intermediate
representation also depend on source language and portability. Does
bytecode also serves the same purpose as intermediate language. Please
reply.


>>


Byte code as popularized in Java had two characteristics that may be
common to other requirements.


First, originally they were targetting very small devices. So encoding
the instruction in as small a space as a byte, was a literal criteria.


Second, they had it in mind that the devices might vary quite a lot at
the hardware architecture level. They really had portability as a high
priority.


That latter aspect coincided with a completely different market place
requirement to make certain portions of web pages conform to the
developers goal of write once, execute anywhere. Portability and the
distribution aspect of web pages made Java explode in popluarity like
nothing before it.


So this leans in the direction of answering your general question. Your
objectives determine the best intermediate representation. I think that that
means not your source code language definition.


You have two fundamental structures as you parse a valid program in
your language. The first structure represents the relationship that
you detected amongst the pieces of the input.


The second structure can be deployed over time as you stream out you
intermediate result, or perhaps as you accumulate the whole thing in
memory before streaming out any intermediates. The second structure
corresponds to a representation of what should occur at execution
time.


So your intermediate is not necessarily the exectuable, but it
represents a structure that corresponds to the relationship between
things that will happen at execution time.


Your target is a machine that has a finite set of states. At each
operation it can change state. Your intermediate at some level of
detail or generlazation represents each thing that will step the
machine through its states, and represents the relationship between
those things.


The classic things are
- do something to the state of the machine
- conditionally do something
- loop
- conditionally loop
- end a loop
- end the whole process


Those are perhaps kinds of things to do. The relationships are just
pointers really. To change data, you point to the data from the
representation of the change action. To control a loop conditionally,
you point to the control item in the representation.


So you know that you have basic actions to change the state of the
machine, and that data is used and can just be harness by pointers in
the intermediate representation. The only shift you need is to realize
that the intermediate represents things at execution time, and
preceding structures in your system represented the detected structure
of the input code.


Your hope and theory is that there is a semantic correspondence
between three things: the structure of the input code so detected, the
structure of the final result that the intermediate code will enable
at execution time, and lastly the algorithm that the coder actual had
in mind when they thought it out in your 'language'.


Lets clarify a few other issues. Byte code is not portable because of
any supposed portability of the source language Java. This is not to
pick on words here, just to clarify. Java source code might well be
portable now. But byte code itself was an achievement of very wide
portability. Note also that compiler technology can take code through
many intermediate representations. There are intermediates in some
compilers _between_ Java and the JVM byte code. When folks encourage
you to plan an intermediate representation, they are not necessarily
say that this will be placed on a file.


Byte code functions almost as a machine code in JVM format, but can be
slammed through a JIT to generate yet another form that is platform
specific, and supposedly efficient.


Focus on the structure of activity that will occur at execution time,
and try to imagine how you will represent that in some intermediate
form. It is a step by step representation. Optimization that has been
mentioned generally, can detect redundancy and inefficiency in this
basic long intermediate representation. If you know what thos
optimizations are you might actual do things in the intermediate
representation that make it easier for the optimizer to find
opportunities for improvement before you final representation is
streamed out.


Best Wishes,
Bob Rayhawk


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.