Related articles |
---|
AST for several modules Andreas.Ames@Tenovis.com (Andreas Ames) (2000-06-10) |
Re: AST for several modules idbaxter@semdesigns.com (Ira D. Baxter) (2000-06-14) |
Re: AST for several modules rod.bates@wichita.boeing.com (Rodney M. Bates) (2000-06-14) |
Re: AST for several modules cfc@world.std.com (Chris F Clark) (2000-06-14) |
Re: AST for several modules iank@idiom.com (2000-06-14) |
Re: AST for several modules danwang+news@cs.princeton.edu (Daniel C. Wang) (2000-06-20) |
Re: AST for several modules Andreas.Ames@Tenovis.com (Andreas Ames) (2000-06-20) |
Re: AST for several modules joachim.durchholz@halstenbach.com.or.de (Joachim Durchholz) (2000-06-20) |
Re: AST for several modules vugluskr@unicorn.math.spbu.ru (2000-07-18) |
From: | Andreas Ames <Andreas.Ames@Tenovis.com> |
Newsgroups: | comp.compilers |
Date: | 20 Jun 2000 02:41:40 -0400 |
Organization: | Tenovis GmbH u. Co. KG |
References: | 00-06-038 00-06-058 |
Keywords: | summary, parse |
Hi,
I've had some further questions to Mr. Clark and sent it via mail. In
his answer ( thank you ) he asked me to post our communication to the
group. The whole beast has become a little bit lengthy. So here we
go ( hope the format is readable ):
Here's my question:
-------------------------------------------------------------------------------
hi,
thanks for your answer which leads me to some further questions.
> The internal representation that most compilers use for data and
> control flow analysis typically goes by a different name. Usually
> it is called the "Intermediate Language (IL)" or "Intermediate
> Representation (IR)"--although there are probably as many variations
> on the name as there are multi-language compilers (i.e. dozens ;-)).
I'm very interested in intermediate languages other than ASTs
applicable to a possibly great range of programming languages and
independent of a particular compiler backend. In my case this
interest is not optimizing but related to reengineering. Such an IL
could possibly be a very useful foundation for an analyzer tool/suite
that is independent of the programming language and even for a
translator from one programming language to another. Some time ago (
about a year or so ) I've asked exactly this question in
comp.compilers if there were resources concerning such multi-language
ILs because the dragon book seems to mention that experiments with
such ILs were not all too successful. unfortunately then i got no
answers.
I also thought about rtl ( or rlt? ), the IL used by gcc.
unfortunately this seems to be very dependent on the particular
backend in question. so the same program could generate different
rtl(?)-representations depending on the used backend, which doesn't
seem to be very useful for me. also i don't know about a formal
semantic or even syntac description for it.
> It is worth noting the "bytecodes" and "virtual machines" (as in the
> JVM) are closely related to IL's. So are the members of the Forth
> family (such as Postscript). The other term worth tracking down is
> SSA (Static Single Assignment form), as it is essentially an IL with
> some nice properties.
Eventually I also thought about ( other ) assembler languages.
bytecodes and assembler languages as ILs would have the advantage that
the IL would have a well defined syntax and even semantics ( so could
possibly be used to simulate a program ). But I guess that there
would also be disadvantages depending on the application domain of
such a compiler?
> Note, this was actually a timely question for me, as the next
> edition of my "Practical Parsing Patterns" column in Sigplan Notices
> is on this exact topic--although I haven't written more than a
> couple of paragraphs of it yet, so I can't forward a preprint of it.
> Also, anyone who want to disabuse me of my misconceptions
> (e.g. argue that bytecodes or Forth are not like ILs at all) would
> do me a favor of posting (or emailing) in response to this, lest the
> same errors get set down in print. (The relevant caveat is: One man
> tells a lie, a thousand others repeat it as true.)
i would be very interested in your article but unfortunately have no
idea how to get the mentioned "Sigplan Notices". Can you tell me how
to get it, e.g. an ISBN or something? Sorry for me being so clueless
:-(. do you know of other publications specialized on ILs or other
resources where I can learn about the ILs you mentioned above ( i've
never heard about one of them besides bytecode and Forth and what I
know of these two is not related to ILs )?
> Hope this helps,
very much, thank you. andreas
> -Chris
> *****************************************************************************
> Chris Clark Internet : compres@world.std.com Compiler Resources,
> Inc. Web Site : http://world.std.com/~compres 3 Proctor Street
> voice : (508) 435-5016 Hopkinton, MA 01748 USA fax : (508) 435-4847
> (24 hours)
> -----------------------------------------------------------------------------
-------------------------------------------------------------------------------
And this is what Mr. Clark answered:
-------------------------------------------------------------------------------
You wrote:
> In my case this interest is not optimizing but related to
> reengineering.
The basic thrust of an IL is (should be) representing faithfully the
semantics of the program being operated on. That should be
independent of the target activity--although different activities
might prefer higher or lower (more or less abstract) representations,
and may concentrate on different aspects.
> Some time ago ( about a year or so ) I've asked exactly this
> question in comp.compilers if there were resources concerning such
> multi-language ILs because the dragon book seems to mention that
> experiments with such ILs were not all too successful.
> unfortunately then i got no answers.
There have been multi-language IL's. I worked on two different
compiler suites--the TSI compilers and the MIPS ucode compilers. The
TSI compilers had front ends for PL/I (several dialects), Pascal,
FORTRAN, COBOL, Algol 60, and Modula-2. There was even a RPG front
end, but it used only a small fraction of the IL as an RPG program has
a static structure. The MIPS ucode system supported C, C++, Pascal,
Ada, and FORTRAN.
However, for a language translation effort the level of success is
debatable. For example, in the TSI system there were some distinct IL
operations that were unique to each language (and others that did
different things based on some hidden language flags). The ucode
system was less language specific, but it dealt with a more tightly
constrained set of languages.
----------------------------------------------------------------------
One way to get some info on the ucode system is to order Fred Chow's
PHD thesis from Stanford. It is called "A Portable Machine
Independent Optimizer, Design and Measurements". (At one time, you
could even order the sources to the ucode system.) Most of the paper
is about optimization, but I believe it discusses ucode in addition.
You should be able to find other information about ucode by searching
for info on pcode, which is ucode's predecessor and used as the IL in
a variety of Pascal compilers. (There was even an article once
comparing several pcode dialects.)
A modern system that is trying to reproduce the success of the ucode
project is "SUIF". That is part of the "national compiler
infrastructure project", a consortium of universities that have pooled
their compiler research efforts together.
----------------------------------------------------------------------
The place where most multi-language IL's fall down is that the larger
the spread of languages they attempt to cover, the more difficult the
task of finding common semantics becomes. Eventually the IL gets
Balkanized into myriad fragments all detailed for only one or two
languages. John, the comp.compilers moderator, calls this "heat
death".
This happens because the original designer omits one detail of the
semantics because the language allows some implicit interpretation
that covers it. Later, when the language evolves or is extended, or
when a different language is added to the IL, it is impossible to make
that detail manifest as too many parts depend on the implicit
assumptions. As a result the IL begins to diverge and develop special
cases. At Prime Computer, where I worked on the TSI compilers, we had
a saying (with apologies to the manufacturer of Arpege perfume):
Promise them anything, but give them the Common Backend. The meaning
of that saying was that your programming language could have any
semantics you wanted provided that it was the same as the semantics of
a language we already supported.
> i would be very interested in your article but unfortunately have no
> idea how to get the mentioned "Sigplan Notices". Can you tell me
> how to get it, e.g. an ISBN or something?
Sigplan NOTICES is the monthly publication of the ACM Special Interest
Group of Programming LANguages. (The columns are only included on
months that the magazine isn't reprinting a conference proceedings,
which is about 4-6 times a year.) These easiest way to get it is to
join the ACM and order it. I believe it is also possible to join just
the SIGPLAN group, but I don't know the details on that. The ACM also
has a digital library that contains the text of many of its
publications online (and the SIGPLAN articles are scheduled to be
included).
Hope this helps, -Chris
P.S. If you don't mind having your questions answered in a public
place, I would appreciate it if you would forward this to the group as
it has info that I believe answers questions others might have. You
may edit it to remove sections that you don't want posted.
*****************************************************************************
Chris Clark Internet : compres@world.std.com Compiler Resources, Inc.
Web Site : http://world.std.com/~compres 3 Proctor Street voice :
(508) 435-5016 Hopkinton, MA 01748 USA fax : (508) 435-4847 (24 hours)
-------------------------------------------------------------------------------
cu andreas
Return to the
comp.compilers page.
Search the
comp.compilers archives again.