Detailed Compiler Stages of a Statically Typed, Strict Functional Language?

Scott <scottmcl@comcast.net>
Thu, 24 Feb 2011 19:33:16 -0800 (PST)

          From comp.compilers

Related articles
Detailed Compiler Stages of a Statically Typed, Strict Functional Lang scottmcl@comcast.net (Scott) (2011-02-24)
Re: Detailed Compiler Stages of a Statically Typed, Strict Functional torbenm@diku.dk (2011-02-28)
| List of all articles for this month |

From: Scott <scottmcl@comcast.net>
Newsgroups: comp.compilers
Date: Thu, 24 Feb 2011 19:33:16 -0800 (PST)
Organization: Compilers Central
Keywords: design, question, functional
Posted-Date: 26 Feb 2011 11:40:36 EST

I am seeking reasonably canonical information on the compiler stages,
perhaps repeatedly or recursively invoked, of a strict evaluation,
statically typed functional language compiler.


By "functional language" I mean , parametric polymorphism (or
"generic" data types and functions), extensive use of HOF functions
such as map and foldl/foldr for computational control flow, function
composition operators (combinators), algebraic data types and pattern
matching, both tupled and curried function arguments, partial
application of both curried and tupled arguments, selection of data
representations for algebraic data types, reasonably extensive use of
inlining (and alpha renaming) for acceptable performance of control
oriented HOF (and direct application of functions passed to HOF),
algebraic transformations on term expression (id x => x and so very
much more), deforestation/stream optimization on chained map, filter,
fold HOF functions to eliminate allocation of needless intermediate
data structures, lambda lifting and associated selection of
environment/closure/function representations, all consequent beta
reduction opportunities after various transforms, "smart" compilation
of complex pattern/guard case expressions (or equivalently styled
function definitions), and then optimizations more commonplace to most
languages: common subexpression elimination, control flow "peep hole"
style optimizations, yada yada yada....


I am particularly interested in * IR * , or various intermediate
representations starting from the lowly full parse tree (pretty
printing), to the initial AST (and module name resolution), to simpler
typed (or untyped) often extended lambda calculus languages, to
potential various normal forms to lowest level (?) forms such as CPS
or SSA.


What compiler transformations, expansions, name resolution renaming,
representation selections, runtime system support (stack maps for
perfect garbage collection and support of unboxed double floating
point values is of special interest), inlining, beta representation,
algebraic code transforms, etc. are associated with each particular
IR? And what attributes of each IR are necessary to accomplish these
transforms.


I've read what seems like a gadzillion academic papers on particular
stages, but have found no *very specific* information on how the
entire performance-oriented compilation of a fairly prototypical
strict, functional language progresses from stage to stage,
representation to representation with optimizations, decisions and
transformations occuring at each stage.


As I intend to build a "precise GC" based system (unboxed doubles and
full word sized integer values), I am interested mostly in
descriptions of compilation strategies that preserve type information
from stage to stage, for the eventual construction of stack/register
call frame maps and data structure trace/notrace maps for the use of
the GC.


To repeat: Some (much?) of the particulars are debated in detail, but
I am looking for a fairly explicit description of the entire stage-by-
stage, IR-by-IR, optimization-by-transformation structure throughout
the entire process.


Thanks so very much for all and any advice, papers, (very) well
documented reference implementations,and other information anyone can
provide.


- Scott.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.