RFC: idle thoughts for a C compiler

"cr88192" <cr88192@hotmail.com>
19 Mar 2007 15:36:24 -0400

          From comp.compilers

Related articles
RFC: idle thoughts for a C compiler cr88192@hotmail.com (cr88192) (2007-03-19)
| List of all articles for this month |

From: "cr88192" <cr88192@hotmail.com>
Newsgroups: comp.compilers
Date: 19 Mar 2007 15:36:24 -0400
Organization: Saipan Datacom
Keywords: C, design, question, interpreter
Posted-Date: 19 Mar 2007 15:36:24 EDT
X-RFC2646: Format=Flowed; Original

well, for context, I have a compiler and VM for a script language. in
general it uses a largely C-like syntax, but is technically a very different
language (it is based around soft-typing and prototype objects). the
language is compiled first to bytecode, and then compiled compiled into
assembler and then converted to machine code.


additionally, elsewhere in the project I have largely circumvented most of
the VM, allowing me to load and execute raw chunks of assembler. this gave
me the idea that it would be nice to be able to do similar with C.


Like most non-C languages, my script lang naturally doesn't interface very
well with C, and additionally a vast majority of my statically compiled code
is written in C, so I have been feeling tempted to make the jump to writing
a full-on C compiler (still partly joined with the script VM).


largely this is related to something:
if there is any language likely to interface nicely with C, it is C. the
biggest hurdle being the lack of usable system headers and a (good) way of
introspecting the host APPs' symbol table. otherwise, it is likely at least
to be a major improvement.


now, the big issues are like this:
does it make much sense to go this route?
and if so, how much should I invest, and how should I go about it?
how much emphasis to I put on exact compatibility between the JITed version
and statically compiled C?
....


at a basic level, generic code should be writable that works in both an
existing compiler (gcc in my case), and is loadable at runtime. I am
uncertain, for example, if I should consider allowing special "extensions"
as well, or stick to a more restrictive/standardized design?


and so on...




the following derived from an email:
---


I partly beat together a parser for C.


the parser was mostly derived from my script langs' parser, and consisted
more of ripping out things than adding new things (ripping out lots of stuff
that would never really occure in C anyways). not really that much tested
yet (tested a few basic things, like variable definitions, prototypes,
trivial functions, ...).


partly changed around the operators and precedences, but this is likely
incomplete, ...




major issues:
how should I handle macros?
do I implement and pass contexts for everything, or just leave everything in
globals?
how should I structure the compile phase?
is it even worth the bother?
....


first issue, I could implement macros in one of 2 ways:
as a traditional preprocessor (hassle being that I need to add a whole other
stage just to process the source file into a new buffer).


as a hybrid or inline processing step (technically more similar to that of
lisp macros), which would be less effort, but would have the cost that more
"fundamental" transformations (such as a macro expanding to multiple
statements or a function or variable definition) would not be possible.




likewise for contexts.
sadly, the C parser is stateful, which leads to several possibilities:
I use globals, probably cleared between different source files (least
effort);
I add contexts and rework much of the parser to make use of them.


I will assume that I will clear the state between (unrelated) source files,
mostly to avoid implicit transference, say, of macros, prototypes, or
typedefs.




as for compilation, 2 major approaches exist thus far:
I do a largely quick-and-dirty compiler (more or less directly from parse
trees to assembler code);
I compile from parse trees to an intermediate representation (a kind of
bytecode or treecode), and then to assembler;
....


the former is less effort, but is more likely to run into implementation
limits and produce poorly optimized code;
the latter (a little closer to the current script VM) is more effort, but is
more likely to allow stronger optimizations.




and then, I don't even know if there is a point. could just focus on the
script VMs' FFI (or write more crap in assembler) for all it matters.




or something...


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.