Re: Survey of intermediate program representations?

Hans-Peter Diettrich <>
25 Sep 2005 12:50:07 -0400

          From comp.compilers

Related articles
Survey of intermediate program representations? (Domagoj Babic) (2005-09-02)
Re: Survey of intermediate program representations? ( (2005-09-14)
RE: Survey of intermediate program representations? (Naveen Sharma, Noida) (2005-09-15)
Re: Survey of intermediate program representations? (Domagoj Babic) (2005-09-17)
Re: Survey of intermediate program representations? (Ganny) (2005-09-22)
Re: Survey of intermediate program representations? (manish.verma) (2005-09-22)
Re: Survey of intermediate program representations? (Hans-Peter Diettrich) (2005-09-25)
Re: Survey of intermediate program representations? (Hans-Peter Diettrich) (2005-09-27)
| List of all articles for this month |

From: Hans-Peter Diettrich <>
Newsgroups: comp.compilers
Date: 25 Sep 2005 12:50:07 -0400
Organization: Compilers Central
References: 05-09-01405-09-062 05-09-076 05-09-099
Keywords: analysis
Posted-Date: 25 Sep 2005 12:50:07 EDT

Ganny wrote:

> Approaches in Language Translation
> Few languages provides options for both interpretation and
> compilation and can be used depending on the requirement. For
> example, for development and debugging, Microsoft Visual Basic uses
> an interpreter and for deployment, it employs a compiler.

IMO VB is a bad example for compilation. The compiled code for
traditional VB doesn't much more than calling the same functions as
the interpreter calls. Up to VB3 the "compilation" only removed
comments and symbol tables, but left everything else in the internal
token format, as used by the interpreter. Later VB versions suffer
from the OLE/ActiveX, Variant and String types, which don't profit
from compilation. A compilation with "optimizations" mostly skips the
safety checks (overflow, bounds) of the runtime system, other
constructs can result in corrupt code. VB.NET suffers from similar
problems, because some legacy constructs (OnError...) are not
supported by the CLR, and the conforming part of the language can be
interpreted as well as any other .NET language can be interpreted.

> To illustrate, in traditional compilation approach, for N high-level
> languages and M target machines, N * M translators are required. With
> intermediate language approach, where the IL is capable of supporting N
> high-level languages and M target platforms, then the number of
> translators required effectively reduces to just N + M.

Most multi-/cross-platform compilers follow this approach. The
front-end converts the HLL into an internal intermediate
representation, that is compiled into target specific code by
dedicated back-end processors. The supported languages are restricted,
to the language model implemented in the compiler and runtime system,
so that e.g. only x# languages compile to IL.

> Using another High Level Language!
> Instead of creating a new intermediate language, this approach is
> towards using a high-level language as a intermediate language that is
> already available, highly portable, efficient and sufficiently
> low-level. As you would have guessed, C is an interesting high-level
> language with ability to write low-level code. Owing to this property,
> it is sometimes referred to as a portable assembler or 'middle-level'
> language. The beauty of C is that, in spite of its low-level abilities,
> the code is highly portable....

Due to its low level nature, C code is only as portable as ported
(compatible) runtime libraries exist. This seems not to be the case
nowadays, at least the C gurus insist in using automake and other
tools for preparing and distributing "portable" C code. This way the
portability is restricted to the platforms known to the developer, so
that he can implement and conditionally select all required
workarounds. Most such code just fails to *compile* for Windows,
because this platform is widely disregarded by GNU people, to put it
mildly ;-)

I for my part don't understand this problem, because it IMO would be
not a big deal to implement one common C library interface, with all
the workarounds residing in the target specific libraries. But the C
language also lacks some important data types, like strings, so that
most C code includes a number of (inevitable?) unsafe and unportable
typecasts and pointer operations. In so far I cannot see any reason
for using C for portable code, starting with the horrible declaration
syntax, and not ending with the lack of really portable data types and

> Stack-Based Machines
> This representation assumes the presence of a run-time stack and
> generates code for that. It uses the stack for evaluation of
> expressions by making use of stack itself to store the intermediate
> values. Thus the underlying concept is very simple and that is its main
> strength. Also, least assumptions are made about the hardware and
> support available in the target architecture. When a virtual machine
> (runtime interpreter) simulates a stack-based machine and provides
> necessary resources, this approach can effectively provide
> platform-independence.

IMO stack based code converts easily into register based code, so that
optimizations for any target platform are possible on the fly or
during (JIT) recompilation. Register based code, in contrast, doesn't
scale very well to differently equipped processors.

> Implementations Based on Stack Approach
> .NET Architecture
> .NET architecture addresses an important need - language
> interoperability, the concept that can change the way programming is
> done and is significantly different from what Java offers. If Java came
> as a revolution providing platform independence, .NET has language
> interoperability to offer.

As mentioned above, .NET languages can have different syntax, but are
restricted in their semantics.

> Whereas with .NET, we can write a code in
> COBOL, extend it in VB and deploy it as a component.

In practice this means either to "sharpen" the source code, or to wrap
the language and target specific (unmanaged) code. Converting legacy
code into the according x# language means to essentially convert it
into C#, adopt it to the common type and runtime (GC, object...)
model, and then look what remains reusable from existing code. When in
former times people "think C, write Pascal", they now have to stop
thinking in their well known language, because the CLR imposes a
strict data and control model, as do other virtual machines.

> In .NET, the unit of deployment is the PE (Portable Executable) file -
> a predefined binary standard (similar to class files of Java). It is
> made up of collection of modules, exported types and resources and is
> put together as an .exe file. It is very useful in versioning,
> deploying, sharing, etc. The modules in PE file are known as
> assemblies.

It should be mentioned that the .NET assemblies use almost nothing of
the original PE format, it had been much better to spend them their
own unequivocal file format. This abuse of the native Microsoft PE
format disallows the use of e.g. open source VM's, like DotGNU,
because the Windows PE file loader links immediately to the OS
supplied VM and libraries. This misbehaviour is acceptable with
regards to the VM, that is sufficiently well specified, but
inacceptable as long as the standard libraries are not specified
exactly enough by ECMA or some other (open) consortium. Just as the
interpretation of HTML code depends heavily on the browser, a .NET
program currently will behave differently with every other library

> It should also be noted that each language is versatile and have unique
> features and peculiarities of their own. ...

This simply is not true :-(

Really language specific features are not supported by the CLR,
besides for the built-in C# features, and the workarounds for the
implementation of other features are a mess (see VB.NET).

> Summary
> The benefits of using intermediate languages are well known, and the
> current trend is towards achieving fully portable code and language
> interoperability. To understand the concept of portable code
> compilation, the understanding of the conventional translation
> technology and their advantages/disadvantages needs to be known. There
> are many issues involved in intermediate code design and understanding
> them shall enable us to appreciate the various efforts to generate
> portable code.

Intermediate languages either are stuck to simple (almost arithmetic)
data types and operations, or they have to define a specific object
model. Every known object model has certain advantages and
disadvantages, and it fits together only with languages of no
(different) object model. Just the garbage collected memory
management, that is used in newer IL models for good reasons, is
incompatible with most traditional languages. Converting existing code
to e.g. .NET or Java requires comparable efforts, since all applicable
compilers are only free in the choice of the language syntax, but are
absolutely restricted in the language semantics.

In anticipation of an comment of our honorable mod: .NET is JAU (Just
Another UNCOL), with a (consequently) questionable future. It's a nice
try, but it IMO deserves some more time to ripen to something worth to
mention. As can be seen from this article, the problems of
intermediate languages are not normally discussed in public, instead
they are covered by hypes and false promises.

[IBM had a successful pair of PL/I compilers in the late 1960s, the
intrpreted checkout compiler and the native code F compiler. They
seemed to be reasonably successful. Re a common C library, that's
what POSIX is. What's missing? Windowing stuff? Re JAU, common
intermediate codes can work OK in limited domains, e.g. Fortran-like
languages running on Windows, I can believe it'd work. -John]

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.