Re: language independent intermediate representation

David Chase <chase@world.std.com>
12 May 1997 00:18:07 -0400

          From comp.compilers

Related articles
language independent intermediate representation amir@cs.washington.edu (1997-05-08)
Re: language independent intermediate representation chase@world.std.com (David Chase) (1997-05-12)
language independent intermediate representation Dave@occl-cam.demon.co.uk (Dave Lloyd) (1997-05-12)
Re: language independent intermediate representation mark@omnifest.uwm.edu (1997-05-13)
| List of all articles for this month |

From: David Chase <chase@world.std.com>
Newsgroups: comp.compilers
Date: 12 May 1997 00:18:07 -0400
Organization: Compilers Central
Keywords: analysis, optimize

Amir Michail wrote:


> I am working on a project where I need to perform various analyses on
> a language independent intermediate format. I was looking into the
> gnu RTL and parse tree structures and I am not sure what to use. I
> will probably need to convert the structure into a program dependence
> graph or something similar. I will also need to convert the result
> back into source code (in a fixed language independent of the original
> source language).


You might look at ILOC, or whatever it has mutated into, which was/is
used at Rice University (I think it has been in use for about ten years
now). It is a low-level, RISC-like intermediate code. I've worked on a
couple of compilers now, and that is generally the way to go, EXCEPT:


1. you'll need a primitive of some sort for your constant-case-switch
      statements (common to many languages, inscrutable if translated).


2. "structures" are going to give you grief. Going in and out of a
      high-level language, where the target language is C, structures
      can be a pain. If possible, forget they ever existed, and simply
      do pointers and offsets.


3. use an infinite register set. Again, structures are a pain; do
      they have "value" status, meaning that they are loaded and stored
      from "wide" registers? There are two reasons to preserve structures,
      one is that they can simplify your aliasing analysis a little (MAYBE),
      and the other is that it is nice to use block copies to move them
      around in the generated code. If you can write a general-purpose
      recognizer for the structure movement idiom in your code generator,
      you'd be better off.


A second choice is a sort of cleaned up abstract syntax tree. This
makes more sense if you wish to preserve more of the structure of the
input program; it's been used in the Rice vectorizer (and whatever it
mutated into) as well as some subsequent compilers written by ex-Rice
people (the Dana/Ardent/Stardent Fortran compiler, for one). There's
some troubles using ASTs with C, on account of the language is not
quite as block-structured as it appears (e.g., Duff's device).


I don't know if ANDF is a good intermediate format, the acronym stands
for Architecture Neutral DISTRIBUTION Format. The Java byte codes are
also a distribution format, and are not suited to analysis in that
form (they can be translated, of course).


David Chase
[Last time I looked at ANDF, it was getting to be an awful lot like
obfuscated C, since they wanted to be able to use per-platform stdio.h
and the like. Ugh. -John]




--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.