misc project: a C compiler...

"cr88192" <cr88192@hotmail.com>
Fri, 13 Jul 2007 12:36:43 +1000

          From comp.compilers

Related articles
misc project: a C compiler... cr88192@hotmail.com (cr88192) (2007-07-13)
| List of all articles for this month |

From: "cr88192" <cr88192@hotmail.com>
Newsgroups: comp.compilers
Date: Fri, 13 Jul 2007 12:36:43 +1000
Organization: Saipan Datacom
Keywords: C, question, practice
Posted-Date: 15 Jul 2007 16:23:05 EDT

well, here is the project:
I went and wrote a C compiler (dynamic, primarily compiles and links at
runtime).




purpose:
for my larger projects, something like this is useful;
traditional scripting languages (including custom written ones) have some
limitations making them not very useful (poor integration with a large
pre-existing primarily-C codebase, inability to directly use existing
libraries, ...);
....


as such, the primary near-term goal is to be usable for app extensions, ...


other further-reaching possibilities would include using C reflectively, for
self-writing programs, ...


or, maybe even as the basis for a "new" style of dynamic script-like
languages. they look and behave like script languages, but can weild much of
the full power of C (and are free of all the typical stub and FFI/interface
hassle, as they can directly use C's headers and type constructions).




effort:
so, I started this effort (the C compiler proper) about 4/5 months ago
(around the end of march). though many important components (such as an
in-memory assembler) were written earlier (actually, the assembler prompted
the compiler, but the compiler has turned out to be a lot more work than
could have been imagined).


in general, I have taken a more bottom-up approach, where first I write the
pieces, and then I tack them together.


likewise, I largely started with the assembler, and worked upwards (the
assembler, followed by the lower compiler).
however, a lot of the parser and upper compiler code was reused, and sort of
"beaten into shape" from a previous project (a vaguely javascript-style
language).


major pieces:
preprocessor and parser;
upper compiler;
lower compiler;
assembler and linker.




note that the upper compiler in effect compiles the code into a new
language, sort of a pre-cooked vaguely forth-style beast (with lots of funky
rules so that the lower compiler does not get confused as to how and where
values are being used, ...).


for example:
it is invalid to leave any items on the stack prior to a jump or label (a
kind of "union"/"phi" operation is needed, note that by default, at present,
variable flow is "synchronized" prior to a jump or label);
operations are used to mark the locations of function arguments;
....


note that this language is not imperative, so it is not run directly as
presented, but rather indicates how things are evaluated and are
interconnected (value flow, not the values themselves, is what is being
worked with). all the magic going on in the x86 registers, ... being
silently hidden away.


so, some work goes on in the upper compiler, and different work in the lower
compiler.


I just chose a stack, rather than variables, as the conceptual model (in my
experience I am a lot more experienced with stack-based models than
variable-based models...).




status:
it compiles and runs basic code fragments;
it can in general parse and use system headers;
....


so, in a basic form, it is usable, but being a really useful fully-featured
compiler is a little further, as there are still a number of features not
yet implemented.


so, what I have:
basic statements, language constructions, ...
pointers, arrays, structs, unions, ...


however:
function pointer handling is a little newer, and is not really tested;
adding multidimensional array support required mutilating a lot of the code
in the lower compiler (and at present I am not certain if it will work),
likewise, support for it is still incomplete (and is IMO rather hacky).
initialized variable support is incomplete (no initialized structs, ...).


support for features like static variables, ... is currently lacking.


nevermind that most of this has not been well tested (me recently
discovering and fixing a number of potentially serious bugs, some of which
involved general memory thrashing and similar).


some of the memory thrashing bugs required me to implement a specialized
memory allocator to try to track down (partly macro-based, kept track of
where in the code any particular piece of memory was allocated, scanned the
heap searching for fouled up memory, ...).




lots of other issues...


lame though, is that for the people I know, apart from status updates not a
lot interesting can be said at present.




or something...


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.