Re: Translator design decisions

Hans-Peter Diettrich <>
Sun, 20 Jan 2008 16:07:13 +0100

          From comp.compilers

Related articles
Translator design decisions (Mario Suvajac) (2008-01-19)
Re: Translator design decisions (Hans-Peter Diettrich) (2008-01-20)
Re: Translator design decisions (Chris F Clark) (2008-01-21)
Re: Translator design decisions (Tony Finch) (2008-01-22)
Re: Translator design decisions (Ken Rose) (2008-01-22)
Re: Translator design decisions (Chris F Clark) (2008-01-23)
Re: Translator design decisions (=?ISO-8859-1?Q?Pertti_Kellom=E4ki?=) (2008-01-23)
Re: Translator design decisions (Hans-Peter Diettrich) (2008-01-23)
[1 later articles]
| List of all articles for this month |

From: Hans-Peter Diettrich <>
Newsgroups: comp.compilers
Date: Sun, 20 Jan 2008 16:07:13 +0100
Organization: Compilers Central
References: 08-01-050
Keywords: UNCOL
Posted-Date: 20 Jan 2008 19:39:42 EST

Mario Suvajac wrote:

> The translator has virtually unlimited number of input languages and
> translation is done in between them, so every input language can be an
> output language. The languages are Pascal, BASIC, NC like. I was
> thinking about making an assembler like Intermediate language in
> between, so basically I need to make TO and FROM Intermediate language
> translations for each language.

I have an similar dream, an decompiler that can produce the high-level
source code in the most appropriate language. In this case the input
modules are restricted to binary executable or library code, and I've
implemented several ones for 68k, x86, Java, .NET and other bytecode
virtual machines. But even with that limitation it's hard to find a
common internal representation of the code, for expressions, statements
and procedures.

Then I thought about extending that model for source code input, with
further complications. There I ended up with language specific
constructs and data structures, like For-C, For-Pascal, Switch-C,
Switch-Pascal, which describe a statement as found in the input. The
details, required for a translation into another language, can be
retrieved from these objects, e.g. that the cases in a C switch fall
through, whereas a Pascal case statement comes closer to a multiway If.
When an output module cannot represent such a statement of a different
language, the code structure shall be expanded into more and more basic
instructions, until the code can be translated into the target language.
Of course there exist fundamental incapabilities, which prevent
translation e.g. of SEH into Basic or other languages with a different
error/exception handling model, or local subroutines in Pascal code. I
also never tried to translate classes across languages with different
object models.

Next come the attributes, e.g. "const" in C, which sometimes can get
lost in a translation without affecting the program operation. Even the
framework introduces problems, e.g. which header files must be included
into C source files.

  > For the project at first I wanted to
> use Lex+Yacc but then I found out about TXL and now think about using
> TXL instead. Everything else will be done in C++. Also, speed is not
> important.

I'd allow for multiple parser generators, as appropriate for a source
language. Most of my parsers are hand-written, LL or PEG like, and their
output (tree) is very language dependent. Of course there exist base
classes for modules, functions, statements, expressions, and for data
types and variable and constant declarations, but the nasty details are
encapsulated in language specific derived classes. Then it's up to the
source-classes, to break down a construct into more simple constructs
(statements...), and the target modules may or may not know how to
translate a specific source class into target source code.

> [A good first step would be to learn about the many many times this
> has been tried and failed in the past, as far back as the 1950s. You
> can start with this 1991 article I posted to comp.compilers:
> -John]

Thanks for that collection :-)

I agree that a translation between programming languages is impossible
in general, and even not always feasable in specific cases, but with
some restrictions it might be possible to produce at least code snippets
of easily translatable parts. I also could imagine an mixed-language
output, where the desired target language determines at least the output
file structure, and the not yet translatable parts are left in comments,
in the source language form. [Like assembly code embedded into HLL code]
[With the known problem of incompatible comment delimiters]. This would
be a first test for the transformation of some source code into the
internal representation, and back into the source language.

[Just to save you time, the so-far inevitable trajectory of an UNCOL
project is that they try a couple of semantically similar source
languages, and a couple of semantially similar targets, it seems to
work OK, and wild enthusiasm ensues. Then as they add more sources
and more targets, it becomes apparent that each one requires a bunch
of new special case hacks in the intermediate language, which rapidly
overwhelms whatever common stuff they thought they had. After a
while, the project quietly disappears. Heard from ANDF lately?

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.