Re: Third party compiler middle and back-end

"BGB / cr88192" <cr88192@hotmail.com>
Mon, 18 Oct 2010 11:32:04 -0700

From comp.compilers

Related articles
[9 earlier articles]
Re: Third party compiler middle and back-end cr88192@hotmail.com (BGB / cr88192) (2010-10-13)
Re: Third party compiler middle and back-end cr88192@hotmail.com (BGB / cr88192) (2010-10-13)
Re: Third party compiler middle and back-end FredJScipione@alum.RPI.edu (Fred J. Scipione) (2010-10-13)
Re: Third party compiler middle and back-end danielzazula@gmail.com (Daniel Zazula) (2010-10-17)
Re: Third party compiler middle and back-end gneuner2@comcast.net (George Neuner) (2010-10-17)
Re: Third party compiler middle and back-end gneuner2@comcast.net (George Neuner) (2010-10-18)
*Re: Third party compiler middle and back-end cr88192@hotmail.com (BGB / cr88192)* (2010-10-18)**
Re: Third party compiler middle and back-end redbrain@gcc.gnu.org (Philip Herron) (2010-10-19)
Re: Third party compiler middle and back-end cr88192@hotmail.com (BGB / cr88192) (2010-10-19)
Re: Third party compiler middle and back-end gneuner2@comcast.net (George Neuner) (2010-10-22)
Re: Third party compiler middle and back-end bc@freeuk.com (BartC) (2010-10-22)
Re: Third party compiler middle and back-end bc@freeuk.com (BartC) (2010-10-23)

| List of all articles for this month |

From:	"BGB / cr88192" <cr88192@hotmail.com>
Newsgroups:	comp.compilers
Date:	Mon, 18 Oct 2010 11:32:04 -0700
Organization:	albasani.net
References:	10-10-010 10-10-013 10-10-019 10-10-022 10-10-026
Keywords:	code, translator
Posted-Date:	18 Oct 2010 23:32:27 EDT

"George Neuner" <gneuner2@comcast.net> wrote in message
> On Wed, 13 Oct 2010 13:46:50 -0700, "BGB / cr88192"
> <cr88192@hotmail.com> wrote:
>
>>for GCC, I much prefer the overall process, since it is more solidly
>>divided
>>into layers and representational stages (nevermind if one wants to
>>understand how the code itself works, as least they know that it takes a
>>certain input and produces a certain output). takes GIMPLE, produces GAS
>>(nevermind what happens internally...).
>>
>>LLVM seems to be a bit more winding and a bit more object-centric from
>>what
>>I have seen (so, it is more like dealing with a lot of object-plugging,
>>rather than dealing with layered data filters/transforms, and one has to
>>look at a bunch of different classes to understand how things fit
>>together).
>
> Well, GCC and LLVM have very different philosophies. The developer
> interacts with LLVM at a much lower level than is typical with GCC.
>
> GCC mainly is designed to be a common back-end. It provides
> monolithic modules which essentially are meant to be used (or not) "as
> is". Most projects using GCC only ever replace the front-end parser
> or back-end code generator. GHC is the only project I know of which
> made extensive modifications to the middle.

GCC's design made some sense to me, and so my framework has an overall
similar process and architecture in many respects.

one difference though is that my framework provides no direct analogue of
BFD, but then again, at the moment I don't really need it (the machinery for
dealing with object files is mostly located in the assembler and linker, or
spread around "here and there").

> In contrast, LLVM is designed as a toolkit. It provides common IR
> data structures and methods to work with them, and a collection of
> library modules, each of which implements just one high level
> "function", and which are meant to be composed and strung together as
> needed by the compiler developer. LLVM was meant to be extended: it
> provides skeleton code for creating new modules.

well, my effort has some similarity here as well, although my effort tends
to be much more coarse-grained than LLVM (I deal more with components along
the lines of codegens or assemblers, rather than individual classes
representing individual actions/tasks/... or similar).

so, it is assembled more at the level of what could be compared to
car-engines and transmissions and similar, rather than at the level of
individual screws and bolts.

my view though is that major components should still be replacable though,
rather than building the entire thing into a monolith. for example, this can
be done via standardizing on aspects of the data representations and
external API's, such that one can "unbolt" one major component and drop a
functionally analogous one in its place.

for example, one can swap out the codegen for a new one, or make a new
frontend which targets the internal IL in use, ... (or, at least
theoretically, as more often I have just ended up hacking new functionality
onto prior components until they develop a thick layer of cruft, or just do
wholesale copy/paste/edit processes...).

but, an IMO the very fine-grained class-based nature of LLVM is confusing to
try to follow or make sense of, in what attempts I have made of looking at
it...

> IMO the way GCC is structured limits its usefulness ... the middle IR
> section is designed for procedural languages and it does not handle
> very well any language that isn't mainly procedural. The developers
> of Haskell rewrote large sections of the middle from scratch to
> support Haskell's type model and lazy functional semantics. GHC still
> is the only project I know of that has done this.

in my strategy, ideally the whole middle-end could be swapped out for a new
one if needed.

> Turing equivalence guarantees that any language can be emulated in C,
> and so technically GCC can support any language ... but the results
> can be disappointing. It's been my observation that most compilers
> for declarative languages - functional, logic, dataflow, etc. - have
> not been based on GCC. And, AFAIK, no compiler for any pure OO
> language is based on GCC.

yes, but then again one can argue that in most cases, these non-standard
language designs are either limited to smaller domain-specific languages, or
tend to be academic / research languages with little hope of gaining
widespread acceptance even if they did have good compiler and tool support.

AFAICT, the only languages with much hope of gaining much widespread use are
those with both an at least vaguely familiar syntax and core semantics,
comprable to what most people are most familiar with (otherwise, people will
be like "gasp... what is this horror?...", and simply walk away).

this largely limits things to OO-ish languages with a procedural core, maybe
hints of FP style, and a conventional syntax (so people are not scratching
their heads wondering just what sort of bizarreness they are left looking
at...).

like, not everyone needs to wear business suits, but the coffee should still
taste like coffee...

and, if someone does need something oddball, there is little harm in writing
the machinery themselves.
personally though this is why it is probably better to support at least
coarse-grain modularity, such that any general purpose components can be
reusable, although it is IMO not generally useful to try to go to the
extreme of typical "toolbox" APIs, which tend to be confusing, awkward to
use, and violate the principle of maintaining abstraction and opaqueness
(or, the "black box" mindset, as it were).

IMO a toolbox should not provide more than what one can do without or
abstract away. at what point something becomes a dependency it also becomes
a liability.

>>side note:

>>I had looked at LLVM some before a few years back, but at the time
>>it didn't do what I wanted or how I wanted things done (basically,
>>converting directly from IR to machine code and putting machine code
>>into executable buffers). there was no dynamic relinking, but
>>instead a process of recompiling chunks of IR and patching up the
>>prior versions of functions with jumps to the new version.
>
> I haven't played very much with the JIT generators so I can't comment
> much on this ... thus far I've mostly used LLVM for stand-alone
> compilers. However, it permits quite low level, pass-by-pass control
> of the compilation process - so I suspect that what you want to do is
> possible.

I think it is, but likely by using it to produce output in a usable form
(say, COFF modules) and linking them with my own technology.

at the time I looked into it, this was before the addition of the ability to
export or link object files or deal with textual ASM, which I had personally
regarded as fairly fundamental (and I felt little need to abandon my own
technology for sake of this).

it really doesn't help much that LLVM is code/implementation centric, rather
than data or representation centric. I am much more comfortable dealing with
things defined in terms of particular data serializations, than in terms of
its code.

code is something we throw away and rewrite as needed to serve whatever
particular uses have come up, and the standardization of representation and
external interface so that code can be swapped out relatively cleanly
(ideally absent huge amounts of pain or internal re-engineering).

like, the ever important black box:
no one needs to know or care what is inside, and if they want something
different, they can write a new black box which fits into a similar-shaped
hole (but may do something different internally, or accept different input
or produce different output).

OO "can" be used as a means of implementing black boxes, but more often
people do something altogether different: overriding the implementation and
mucking around in the internals (IOW: they don't implement black boxes, they
implement white boxes).

if the unit of abstraction is too small (individual classes, rather than
subsystems), this also destroys abstraction, as it may be reasonable to
clone a subsystem and give it a diffent internal structure, but it is kind
of a waste of time to unecessarily clone a class heirarchy, as by this point
the unit of abstraction is too small to be of much use (and it has become a
matter of a specific implementation).

similarly, data serializations should also be conservative about what and
how much of the implementation they expose (raw data dumping is preferably
avoided as this has very little abstraction, and data formats should be
open-ended and minimalistic in their core elements).

for example, it is bad design to try to engineer every possible use case
into a single data format, but rather a format should (ideally) remain
agnostic about both its intended use and its payload (I will note the
general design of TCP/IP and many other networking protocols as an example
of this: a protocol is defined in terms of a number of layers each of which
performs its expected task and largely ignores other layers, hence
reasonably maintaining a certain level of abstraction).

or, computers in general, ...

I would cite the JVM as being much weaker, since it provides relatively poor
abstraction between its layers (Java ByteCode is mixed up both with many
details of the Java language, its underlying implementation, and the means
for organizing and accessing compiled code), and combined with its heavy use
of early binding (it is necessarily to resolve types and methods prior to
producing bytecode), notably compromises its overall flexibility (infact, C,
x86, and PE/COFF are much more flexible in these regards, which is sad...).

> LLVM has changed and grown quite a bit in the last few years so it
> might be worth looking at it again.

yeah, I am aware of many of these changes...

the main issue is that at the moment it would be a terrible pain to rework
my framework to be able to really utilize LLVM, and my framework and LLVM
address different purposes.

I tend also to follow the path of least effort.

> The main point I'm trying to make here is that if you don't like how
> it handles something, IMO it is much easier to change and extend LLVM
> than it is GCC.

fair enough, but in my case, I don't actually use either of them, as I chose
instead to hand-roll most of my own code to serve my particular uses. hence,
I can write code to address pretty much any use I want to address without
having to deal as much with other people or their ways of doing things (and
discard and/or rewrite code if this seems like it will be less effort).

last time I ran into LLVM I also ran into... internal politics... and it was
fairly apparent that people are much more concerned with particular code
than about matters of engineering or abstracting between the interface and
implementation (which I suspect is a major point of conflict between the
"tool box" and "black box" design methodologies).

granted, a drawback is that the volume and complexity of the code involved
have slowed down my overall rate of progress.

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Third party compiler middle and back-end

"BGB / cr88192" <cr88192@hotmail.com>Mon, 18 Oct 2010 11:32:04 -0700

"BGB / cr88192" <cr88192@hotmail.com>
Mon, 18 Oct 2010 11:32:04 -0700