Re: Need an interesting topic for an undergraduate project on Compilers

Christophe de Dinechin <christophe.de.dinechin@gmail.com>
Sun, 4 Sep 2011 23:42:48 -0700 (PDT)

          From comp.compilers

Related articles
[10 earlier articles]
Re: Need an interesting topic for an undergraduate project on Compiler bumens@dingens.org (Volker Birk) (2011-08-31)
Re: Need an interesting topic for an undergraduate project on Compiler cr88192@hotmail.com (BGB) (2011-09-01)
Re: Need an interesting topic for an undergraduate project on Compiler gneuner2@comcast.net (George Neuner) (2011-08-31)
Re: Need an interesting topic for an undergraduate project on Compiler redbrain@gcc.gnu.org (Philip Herron) (2011-09-03)
Re: Need an interesting topic for an undergraduate project on Compiler cbergstrom@pathscale.com (=?ISO-8859-1?Q?=22C=2E_Bergstr=F6m=22?=) (2011-09-03)
Re: Need an interesting topic for an undergraduate project on Compiler gneuner2@comcast.net (George Neuner) (2011-09-03)
Re: Need an interesting topic for an undergraduate project on Compiler christophe.de.dinechin@gmail.com (Christophe de Dinechin) (2011-09-04)
| List of all articles for this month |

From: Christophe de Dinechin <christophe.de.dinechin@gmail.com>
Newsgroups: comp.compilers
Date: Sun, 4 Sep 2011 23:42:48 -0700 (PDT)
Organization: Compilers Central
References: 11-08-006
Keywords: courses

On Aug 6, 7:28 pm, amit karmakar <amit.codenam...@gmail.com> wrote:
> I would like to have some suggestions as to what *new* and
> *innovative* project i can do which are based on compiler design.


Innovation in compilers can happen at a number of levels :


1. Parsing techniques, grammars, etc. Very active research a while
back, considered (erroneously methinks) as dead by most today, who
happily use flex/bison and don't think twice about it.


2. Language design. One of the active areas these days is "domain
specific languages" or DSLs, i.e. languages designed for one specific
need. Often using "meta-programming" techniques (programs that
generate programs)


3. Type systems, proofs, program validation. Languages like Haskell
use type inference, so that you don't have to specify types yourself
most of the time. C++ recently gained the "auto" keyword for types.
DSLs pose a new class of interesting problems in that space.


4. Intermediate representations, code generation and optimization
frameworks. The king of this hill these days IMO is LLVM. But there
are a number of contenders. If you are interested in optimizations,
that's the right place to look at.


5. Runtime support : garbage collectors, just-in-time code generation,
parallel execution, use of new hardware such as GPUs, 


6. Support for innovative hardware, hardware generation, hardware/
software co-design, etc. If you are more into silicon, this is a very
interesting are to learn about.




My own pet project, XLR (http://xlr.sf.net) offers a number of
innovations in the first three of these areas. It is a language
designed to grow with the user, i.e. the objective is to make it as
easy to add language constructs as it is to add, say, functions or
classes in other languages.


Regarding parsing, it generates a parse tree made of exactly 8 nodes :
integer, real, text and name/symbol represent leaves of the tree,
infix, prefix, postfix and block represent inner nodes. This makes it
possible to write programs in a very natural-looking way, yet with an
internal program representation that is easy to manipulate. This is
the foundation of XL meta-programming / DSL capabilities.


To validate that, XLR has practically no built-in constructs. It has
constructs to connect to LLVM primitives, constructs to connect to C
code, and a pair of "rewrite" constructs, notably ->, to transform one
tree shape into another. For example :


        extern bool puts(text);
        (x:integer - y:integer):integer -> opcode Sub


        repeat 0, B -> true
        repeat N, B -> B; repeat N-1, B


        repeat 25,
                puts "Hello"


You can check the code generated for the above with xlr -tcode -O3
tests/09.Compiler/optimized-repeat-loop.xl. LLVM actually turns it
into a sequence of 25 calls to puts, you can hardly do better.


The most active area of research for XLR these days is its type
system. In order to generate efficient code, an Haskell-like type
inference mechanism is in place. But the standard type inference
algorithms must be extended, because there are a few additional
transformations compared to lambda calculus (not just "alpha" and
"beta"), and the closest there is to a type is the shape of a tree
(e.g. "if X then Y else Z").


Since it uses LLVM, it is also an interesting way to learn a little
about LLVM, but it's not intended as an LLVM tutorial.


So if you are interested in experimenting with "growing a language" in
a text-based framework, XLR is the right way to go. There are other
projects that are more advanced e.g. if you want to build the IDE at
the same time, see for example JetBrain's Meta Programming System. But
they are not as strong in language development per-se, I believe.



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.