Re: Third party compiler middle and back-end

"BGB / cr88192" <>
Sun, 10 Oct 2010 11:39:05 -0700

          From comp.compilers

Related articles
Third party compiler middle and back-end (Daniel Zazula) (2010-10-08)
Re: Third party compiler middle and back-end (glen herrmannsfeldt) (2010-10-10)
Re: Third party compiler middle and back-end (Mike Playle) (2010-10-10)
Re: Third party compiler middle and back-end (Philip Herron) (2010-10-10)
Re: Third party compiler middle and back-end (BGB / cr88192) (2010-10-10)
Re: Third party compiler middle and back-end (Jean-Marc Bourguet) (2010-10-11)
Re: Third party compiler middle and back-end (James O. Williams) (2010-10-11)
Re: Third party compiler middle and back-end (George Neuner) (2010-10-12)
Re: Third party compiler middle and back-end (Robert A Duff) (2010-10-13)
Re: Third party compiler middle and back-end (BGB / cr88192) (2010-10-13)
Re: Third party compiler middle and back-end (BGB / cr88192) (2010-10-13)
[10 later articles]
| List of all articles for this month |

From: "BGB / cr88192" <>
Newsgroups: comp.compilers
Date: Sun, 10 Oct 2010 11:39:05 -0700
References: 10-10-010
Keywords: code, tools
Posted-Date: 11 Oct 2010 00:34:22 EDT

"Daniel Zazula" <> wrote in message
>I want to write a compiler that generates assembly, but I do not know
> assembly, I've already started studying the FASM, but it will take
> much time to learn everything I need to know in order to write a
> decent back-end.

this is what I did (although, initially, I knew ASM, mostly...).

I wrote my own assembler, mostly because I wanted an assembler which
ran at runtime (basically, all assembly and linking would be done
within the running program, so that it would be more useful for JIT
and similar), and I had concluded at the time, after looking at the
source, that NASM would have been a bit of a hassle to make work for
my uses (I had used a NASM-style syntax, and did not know of YASM or
FASM, which would have also been options).

originally, it did everything all at once, but was later split
(internally) into a separate assembler and linker stage (typically
with COFF used between them).

this assembler has been working fairly well for my uses about 5 years now.

I later wrote a JIT, but this fell into disuse (it was later mutated into my
later codegen).
however, this codegen is a big, ugly, awkward, and buggy mess, and I have
yet to find a good way to escape this (a few times I had started trying to
write something clean, but thus far these efforts almost invariably fail).

granted, I never really "designed" the codegen, rather it emerged originally
from an interpreter (which was modified into being the JIT) and was modified
and extended by a large number of alterations and hacks (made to handle C
code and use native calling conventions, then later to support x86-64 and
the Win64 and SysV/AMD64 calling conventions, ...).

> So I though about using a third party back-end, I would write the
> front-end that parse the language into a intermediate code and leave
> the rest to the back-end. I gave a look at GCC but it is too big, too
> vast and too complex for what I want. Microsoft's Phoenix also don't
> work for me since it generates CIL.

sadly, there is a lack of "good" options.

LLVM exists, and is technically the 'best' option, although admittedly I
don't as much like its architecture (IMO, both unorthodox as well as a bit
overly centralized and implementation-centric).

JBC (Java ByteCode) also exists, although I am not as much of a fan of the
JVM either (the JVM is awkward/nasty on the C side of things).

CIL isn't totally unworkable though, since Portable.NET and DotNetAnywhere
also exist (although both are interpreters by my definition, so this may be
a detractor). Mono could be better, but is IMO a fairly poor implementation
(Mono could have been much nicer, if their coding practices were maybe a
little less nasty...).

would be better if better alternatives to MS's .NET existed...

I am personally trying to find a good solution to all this at this point (no
ideal solutions though).

recently I am left considering an older idea of using a modified form of JBC
and a different VM architecture from the JVM (I wrote a lot of the code 1-2
years ago, but didn't do much with it since then). the main alteration was
mostly to add more opcodes (mostly to add better support for C-style and
dynamic typesystems, and something more similar to the P/Invoke mechanism,

a recent hack was also to allow having JBC with a COFF-based container
(rather than ".class" files), and an image-based rather than serial
structure (so, it would be a little more like native code). it also uses a
different signature system (it uses my VM's signatures, rather than JVM

I am half imagining linking them into PE/COFF images, or directly loading
via the COFF's (even on Linux, loading a bytecoded PE/COFF image is still
likely to be easier and faster than loading/processing a JAR, with the idea
here being to assume that these PE/COFF images are CPU-neutral). an ELF
mapping is also possible, but not within current/immediate plans (this would
mostly make sense for Linux mixed-code images or similar).

for the moment, I will probably stick to using ECJ (Eclipse Compiler for
Java) for any Java compilation (as my own Java + C# compiler is a bit far
from completion, and if JBC remains in use, I may as well, for the time
being...). (possibly, a tool could also exist to directly link the class
files into a DLL...).

I am not entirely sure what direction I will go with all this.

sadly, for all its merit, CIL doesn't mix well with my overall architecture
(the way it approaches metadata is largely incompatible with my existing
VM), and I have no real reason to compete against MS's .NET implementation.

note, in all of these cases, the bytecode would be either interpreted or
JIT'ed as needed.

a direct AST -> ASM JIT was also considered, but this strategy has a few
drawbacks related to large statically-compiled libraries: directly
converting from ASTs to ASM would be expensive, and could hurt startup times
if a lot of library code is managed by the VM (even if binary code is cached
between runs).

also, ASTs make it too easy to reverse a lot of the process (recreating an
analogue of the original sourcecode), as well as placing some limits on
flexibility and semantics (adding new semantics would mean having to update
the backends, which is easier to gloss over with most bytecode formats).

however, an AST based JIT does make more sense for dealing with dynamic
script loading and interactive entry / eval (and, in any case, a means to
compile ASTs to ASM is needed...).

sadly, all of this is more personal experimentation and development, and is
*far* from being a generally usable solution at this point (implementation
holes and bugs abound, and even the large-scale architecture is far from
being settled, as I am using more of a bottom-up / piecewise approach to
designing the overall architecture, namely, whatever seems to work out best
in the competition between possibly strategies is what wins out).

I prefer to try to keep things flexible where possible so that hopefully
parts and designs can be changed / swapped as needed, but this is easier
said than done (as well as the whole matter of getting stuff done, and
eliminating the worst of the holes and bugs, ...).

> The back-end that I'm looking for needs to generate at least x86
> Assembly (although I prefer Amd64), I don't mind if it generates other
> assemblies as well. I prefer back-ends written in C# or Object Pascal,
> but I will also accept C/C++ ones.

any real solution at this point should support at least both (x86 and

at the moment, I would probably have to mention LLVM as the more usable /
mature option.
it is written in C++.

or, one can try to put up with the JVM, because at least it works...

> Any suggestions?

yep, above.

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.