Re: Third party compiler middle and back-end

"BGB / cr88192" <cr88192@hotmail.com>
Tue, 19 Oct 2010 13:08:29 -0700

From comp.compilers

Related articles
[11 earlier articles]
Re: Third party compiler middle and back-end FredJScipione@alum.RPI.edu (Fred J. Scipione) (2010-10-13)
Re: Third party compiler middle and back-end danielzazula@gmail.com (Daniel Zazula) (2010-10-17)
Re: Third party compiler middle and back-end gneuner2@comcast.net (George Neuner) (2010-10-17)
Re: Third party compiler middle and back-end gneuner2@comcast.net (George Neuner) (2010-10-18)
Re: Third party compiler middle and back-end cr88192@hotmail.com (BGB / cr88192) (2010-10-18)
Re: Third party compiler middle and back-end redbrain@gcc.gnu.org (Philip Herron) (2010-10-19)
*Re: Third party compiler middle and back-end cr88192@hotmail.com (BGB / cr88192)* (2010-10-19)**
Re: Third party compiler middle and back-end gneuner2@comcast.net (George Neuner) (2010-10-22)
Re: Third party compiler middle and back-end bc@freeuk.com (BartC) (2010-10-22)
Re: Third party compiler middle and back-end bc@freeuk.com (BartC) (2010-10-23)

| List of all articles for this month |

From:	"BGB / cr88192" <cr88192@hotmail.com>
Newsgroups:	comp.compilers
Date:	Tue, 19 Oct 2010 13:08:29 -0700
Organization:	albasani.net
References:	10-10-010 10-10-024 10-10-025 10-10-027
Keywords:	code, C, translator
Posted-Date:	22 Oct 2010 01:07:52 EDT

"George Neuner" <gneuner2@comcast.net> wrote in message
> On Sun, 17 Oct 2010 04:37:12 -0700 (PDT), Daniel Zazula
> <danielzazula@gmail.com> wrote:

<snip>

>>C static nature won't allow me to use it as an intermediary language,
>
> In fact, probably it will.

I will second this.

Although C seems fairly static, it is really fairly easy to implement
dynamic typing in the language (although, the bigger difficulty is
designing the "ideal" interface WRT the
performance/ease-of-use/... space).

flexible OO is also fairly easy to implement (actually, it is easier to
implement "soft" OO strategies than it is to implement an analogue of more
traditional static class/instance OO).

> With the exception of your object model - which you haven't described
> much at all - the rest of your language is a simple "continuation
> passing style" (CPS) functional model. There are a few stack
> management tricks needed to efficiently implement CPS code in C, but
> it is not difficult to do.
>
> For inspiration you can investigate Scheme-> and Lisp-> compilers.
> There are a number of them that work by first transforming the program
> into a CPS form and then compiling that to C.

yeah.

a vaguely CPS-like strategy is also commonly used IME when implementing
interpreters, as it frees up having to retain a large amount of state
"hidden" in the interpreters' machinery, and also avoids internal recursion
(a 'call' opcode does not mean the interpreter itself needs to make a call).

hence, most of the program state is left folded up in a 'context', and
execution can be stopped or resumed wherever it is convinient.

it is a little more awkward to generate C code which uses this strategy, but
it can be done.
the main change is that, rather than producing code which proceeds linearly
and recursively, one breaks up the code into individual blocks, typically
giving each its own function (each block in this case lacks both internal
jumps/loops, as well as any direct calls or returns).

typically, a function/block will return a structure representing the next
block to be executed, or the function to be called, ... as well as possibly
the arguments to be passed (for now, we can ignore "native" argument
passing, instead all arguments can be passed as, say, an array of
"references").

typically, the main execution is done by a loop (often called a
"trampoline"), which simply calls into the blocks indicated by the current
continuation, with each call potentially returning a new block to be called
into. calls are a special case, usually involving special action, typically
"pushing" the old/next continuation, and creating a new one representing the
called function, which is then what is executed by the loop. returns are
also a special case, usually involving an action to send the final return
value to the caller, and "popping" the old continuation (which is used as
the next continuation). tail-calls are also a special case, in that one does
not need to push a continuation to return to, as effectively the callee
simply disappears.

note that continuations need not be a stack, but infact can be a linked
list.
the advantage of this is that one can implement somewhat more "unorthodox"
control-flow mechanisms, such as the call/cc mechanism from Scheme.

I also used a vaguely similar strategy to the above in an x86 interpreter
with good success, where essentially most of the interpretation was done in
terms of linked structures containing function pointers (only non-cached
instructions were decoded, which meant typically that loops/... would
operate purely in terms of the cache). the main advantage of this was
actually that it largely eliminated the use of very large "switch" blocks
which had been previously eating up lots of time (at least for common
instructions). at this point, most time (according the profiler), was going
into matters of resolving memory addresses, and performing memory and
register IO (namely, one has to resolve interpreter virtual addresses to the
actual memory pages, and get/set register values of certain widths). WRT
performance, it seemed "comprable" to a 486 (on modern HW), and with a CPU
featureset comparable to a Pentium 1 (never got around to fully implementing
MMX or SSE, but it does have CPUID/...). note: self-modifying-code (SMC)
will flush the instruction cache (in this case, the entire cache, with the
instruction decoder starting again from a clean slate).

what actually largely killed off the above effort was not so much
performance, but rather the inconvinience of broken/disjoint address spaces
(fouling up my intended usage plans, unless I was wanting to implement a
fake OS or something... I had originally underestimated the inconvinience of
the address-space issue).

infact, I had once considered the possiblilty of using a CPS-based ABI at
the machine-code level, however I ended up deciding against this due to a
problem:
it would interact poorly with code not using this ABI (most native code),
and mixed-frames would essentially destroy any real advantage. a secondary
problem was that raw performance would be worse with this ABI.

this left the much simpler, but IMO not as good, strategy of using stack
marking and copying (a continuation can be made by backing up the old stack,
and calling a continuation can be pulled off by copying it back). this can
be compared to a more advanced version of C's "longjmp()" feature (bit of a
warning: this form of continuation will *not* mix well with multithreading,
as a continuation will only work within the same thread which created it).

>>LLVM or GCC probably fall under the same category, I thinking about
>>modifying a Javascript interpreter or using it as is throught the eval
>>function. Maybe all this is too much for me to do alone, anyone here
>>interested in this challenge?
>
> Other than the fact that a lot of programmers find CPS "unnatural" to
> work with directly - meaning you might be the only one ever to use
> your language - what you appear to be proposing doesn't sound very
> difficult at all.

yeah, using CPS directly is somewhat awkward, hence I think it is mainly of
use in writing compilers and interpreters. I would personally not want to
write any more than small chunks of code in this style (much like how I
don't want to write more than small chunks in ASM, however, ASM remains as
one of my most heavily used scripting languages, however, much of this code
is also generated procedurally as well, so a small bit of writing ASM often
goes a long way...).

> Because C is a suitable target for most of your language - with the
> object model TBD, I'd expect you to have a first cut of your compiler
> running end-to-end inside of a month (part time).

well, a lot depends here on specifics.

but, yeah, C is a very capable language WRT getting stuff done (even despite
at the outset seeming like a fairly limited and rigid language, in fact it
is much less rigid than many other languages which would, at the outset,
seem to be more flexible...).

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Third party compiler middle and back-end

"BGB / cr88192" <cr88192@hotmail.com>Tue, 19 Oct 2010 13:08:29 -0700

"BGB / cr88192" <cr88192@hotmail.com>
Tue, 19 Oct 2010 13:08:29 -0700