Re: Source to Source compilation - targeting C?

"BGB / cr88192" <cr88192@hotmail.com>
Wed, 30 Dec 2009 12:28:04 -0700

          From comp.compilers

Related articles
Source to Source compilation - targeting C? marktxx@yahoo.com (Mark Txx) (2009-12-28)
Re: Source to Source compilation - targeting C? idbaxter@semdesigns.com (Ira Baxter) (2009-12-30)
Re: Source to Source compilation - targeting C? cr88192@hotmail.com (BGB / cr88192) (2009-12-30)
Re: Source to Source compilation - targeting C? Meyer-Eltz@t-online.de (Detlef Meyer-Eltz) (2009-12-31)
| List of all articles for this month |

From: "BGB / cr88192" <cr88192@hotmail.com>
Newsgroups: comp.compilers
Date: Wed, 30 Dec 2009 12:28:04 -0700
Organization: albasani.net
References: 09-12-043
Keywords: C, translator
Posted-Date: 30 Dec 2009 23:31:48 EST

"Mark Txx" <marktxx@yahoo.com> wrote in message
> Anyone have suggestions for a way to take a fairly simple "custom"
> high level language (syntax and semantics) to target C as the output?
> (doesn't have to be readable C)
>
> By this I mean an existing backend that outputs C already exists for
> the "tool". The "tool" must be relatively easy to use assuming
> knowledge of BNF, grammars etc but not much in the way of code
> generation knowledge.
>
> Does the ROSE compiler framework do this?
> http://www.rosecompiler.org/
>


as for rose, dunno, not looked into it.




can't say much about this as-is, but here is my thought:
in this case, it may just be better to write it yourself, since what is
being described here is, essentially, one of the simpler approaches to HLL
creation (I will claim a certain amount of experience here).




you don't really need "code generation knowledge" to target C, it is not
that complicated, and (vs ASM), C is a very forgiving target.


if one has a general grasp of C, this should be plenty WRT targetting it.
misc note:
this may be one of those rare cases where it may make sense to forget any
aversions to "goto". goto is very useful with emitting code, as then one can
"decompose" most constructs into simpler parts.




similarly, in contrast to how many people obsess over BNF and
parser-generator tools, IME, parsers are one of the simpler parts of a
compiler to write (or, at least once one gets past "trivial" stages).


for a very simple compiler or interpreter, a parser can be a bigger chunk,
but one will find if they go on to writing more advanced compilers (say, for
languages like C or Java), it is not the parser which is complicated
(rather, the "demons" like to hang out somewhere more around the register
allocator, low-level optimizer, and the ASM codegen...).


then again, I have usually always used hand-written recursive-descent, so
maybe parsers are complicated when using all these tools?... (ok, this is
partly satire...).




but, in this case, my thought is to try doing this one for oneself and maybe
learn something in the process.




now, as for a few "hints":
don't try to "parse" directly into your output, as this is awkward and
painful.
maybe take a brief look at LISP and Scheme, even if you don't intend to use
them as such, the languages have a general structure and facilities which
are very useful in compiler writing (even if the compiler itself is to be
written in C, similar facilities can be implemented in C as well).


the idea is that one can, in effect, parse into AST's in an S-Expression
like representation, and use this for basic high-level transformations and
for driving compiler logic.


usually, the idea is to construct ASTs which are a fairly direct
transcription of the input syntax (at the structural / semantic level, but
not necessarily at the lexical level).


"2*x+3" -> "(+ (* 2 x) 3)", for example.
or: "if(z) foo();" -> "(if z (foo))", ...




so, for example:
HLL -> (Parser) -> S-Exps
S-Exps -> (High-level transforms) -> S-Exps
S-Exps -> (C Emitter) -> C




(note, at this moment, I use XML and not S-Exps internally, but this is a
side issue...).




"high-level transforms" is basically a kind of recursive-step expression
rewriting process, which would mostly be responsible for rewriting trivial
expressions into more-trivial expressions, such as converting "(+ 1 2)" into
"3", eliminating non-useful expressions, ...


for C, not that much is really needed here, since C is itself smart enough
to manage most of this in most cases (except maybe with dynamically-typed
languages).


the C emitter would be, likely, mostly a process of walking the produced
syntax tree, and mostly "unwinding" this into C style syntax and semantics
(the exact details of this step likely varying most with the structure and
semantics of the input HLL).




a note for producing C:
don't expect to be (necessarily) able to produce the whole C output file
sequentially, instead, it is advised to allow spliting the generation
possibly into a number of disjoint pieces (say, individual functions), which
are sown together as the emitter process "unwinds".


useful to look into here may be to either have text-buffers, or maybe
"ropes".
if one already has support for cons cells, lists, and strings, than most of
this amounts to basic list operations, such as appending lists, ...


this is most useful if the input HLL differs notably from C in terms of its
overall structure (for example, if one were doing a LISP -> C compiler, for
example).




note that it may also be possible to partition up the emitter logic so that
it, itself, produces the text sequentially, but in practice this process is
more awkward to work with IME.




well, ok, all this probably sounds a bit more complicated, oh well...


maybe look into different options (tools probably are an option, I just
don't have a good suggestion here), and see which best fits with
requirements and goals...



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.