From: | BGB <cr88192@hotmail.com> |
Newsgroups: | comp.compilers |
Date: | Sat, 06 Aug 2011 14:10:12 -0700 |
Organization: | albasani.net |
References: | 11-08-006 |
Keywords: | courses |
Posted-Date: | 06 Aug 2011 20:59:50 EDT |
On 8/6/2011 10:28 AM, amit karmakar wrote:
> I would like to have some suggestions as to what *new* and
> *innovative* project i can do which are based on compiler design.
> Also, considering the time i have to implement the compiler, i can
> think of cutting down work, like working on subset of a language. I
> would preferably not tend to work on only a specific part(phase) of
> compiler. It will be better if I implement a complete compiler for
> some architecture and see the executable running.
new+innovative and compilers, don't often go together, and another
problem is that terms like new/innovative/interesting/... depend
highly on who one is dealing with and their personal biases and
preferences (a cool idea for one person, may be considered stale,
boring, unworkable, ... by another).
a few thoughts:
most traditional research into compilers has been in how to squeeze as
much performance as possible out of them. maybe one can look into trying
for new and interesting features instead.
rather than work on subset languages, maybe it may make sense to work
with a simpler language design.
for example, a fairly simple language is Scheme (except for a few edge
cases) where often a person can throw together a working implementation
fairly quickly (or, at least IME with R5RS and earlier, dunno about R6RS
as I was mostly no longer dealing much with Scheme by this point, and
R6RS at the time looked a bit strange vs what came before).
a slightly less simplistic, but still relatively simple language, is
ECMAScript (basic core language for JavaScript, ActionScript, ...).
probably not worth trying to implement up-front are languages like:
C or C++ (fairly complex languages to implement);
Java (a lot more hairy than it looks, syntax can be deceiving);
...
note that dynamic typing generally makes things much easier to implement
(static typing makes things faster, and is "closer to the metal", ...
but it doesn't make things easier).
a more recent language of mine is using a "soft typing" model, which
basically combines elements of static typing on top of an otherwise
dynamically-typed VM (potentially using types as optimization hints in
the codegen, but treating type-checking, behavioral semantics, and
optimization, as separate issues).
personally, I like RPN / Stack-Machine style ILs (recently got into a
big argument over this though, a person who for whatever reason really
dislikes stack-machine ILs despite them being well proven in the JVM,
.NET, AVM2, ...).
examples of stack-machine languages would include Forth, PostScript,
Factor, ... (PostScript has had a notable influence on the design of my
ILs).
the upside of stack machines is that they are fairly easy to produce
code for (it is often very straightforward to unwind an AST into a stack
machine format), are themselves relatively simple, and are very capable
despite their relative simplicity.
a downside though is that they are relatively fussy about ordering
issues, and a general-purpose native codegen can get a bit hairy (mostly
due to ABI interfacing, for example, the SysV/AMD64 ABI is itself a
complex beast, and one has to effectively "pull a rabbit out of a hat"
to mesh it up directly with a stack machine IL). they are also far less
"du jour" with many people than are other options, such as TAC-SSA
(Three Address Code - Static Single Assignment).
granted, things should be much simpler if one doesn't want to go about
trying to directly call into native (statically-compiled) code, but
instead uses special functions to marshal the calls (I have later found
that this strategy can be fairly transparent as well).
also possibly useful is allowing for eval/... as well...
also, in my case, working to try to make the C interface fairly
transparent (marshaling calls and data-types and similar in both
directions, ideally eliminating nearly all cases of manually-written
boilerplate code).
ideally, the time of isolated languages and frameworks, and of languages
which don't have features like eval, will soon be nearing an end (this
doesn't mean I want many of the existing languages to go away, but
ideally most should have eval as a relatively common library feature, ...).
for example, my language has:
"native import C.foo;"
which allows implementing libraries from C land (the foo is a library
name, and where a tool is used to mine information from C headers/...).
"native package C.foo { ...body... }"
allows exporting the code ("...body...") to C land (in this case, the
boilerplate is written automatically by a tool).
granted, yes, none of this is really terribly new or original, as most
of this has been around for decades.
as for languages containing some interesting ideas:
Scheme (nice core language design);
Self (nice object system, partly carried over in a limited form into
JavaScript);
PostScript (relatively clean stack-machine model);
ECMAScript / JavaScript (simplistic yet conventional syntax);
ActionScript (like JavaScript but more "grown up");
Erlang (concurrent programming features);
...
granted, to be original, one needs to be, errm, original.
like maybe try to come up with some new/interesting language feature or
idea to try exploring, or something interesting to do at the
compiler/codegen level, ...
Return to the
comp.compilers page.
Search the
comp.compilers archives again.