Re: Intermediate forms (again?): worthwhile?

Barry Kelly <barry.j.kelly@gmail.com>
Mon, 24 Jan 2011 10:26:53 +0000

          From comp.compilers

Related articles
[11 earlier articles]
Re: Intermediate forms (again?): worthwhile? cr88192@hotmail.com (BGB) (2011-01-22)
Re: Intermediate forms (again?): worthwhile? ehog.hedge@googlemail.com (chris dollin) (2011-01-22)
Re: Intermediate forms (again?): worthwhile? nospam@myisp.net (Tony) (2011-01-22)
Re: Intermediate forms (again?): worthwhile? bc@freeuk.com (Bartc) (2011-01-23)
Re: Intermediate forms (again?): worthwhile? steshaw@gmail.com (Steven Shaw) (2011-01-24)
Re: Intermediate forms (again?): worthwhile? steshaw@gmail.com (Steven Shaw) (2011-01-24)
Re: Intermediate forms (again?): worthwhile? barry.j.kelly@gmail.com (Barry Kelly) (2011-01-24)
Re: Intermediate forms (again?): worthwhile? nospam@myisp.net (Tony) (2011-01-24)
Re: Intermediate forms (again?): worthwhile? thomas.mertes@gmx.at (tm) (2011-01-24)
Re: Intermediate forms (again?): worthwhile? eh@electrichedgehog.net (Chris Dollin) (2011-01-26)
Re: Intermediate forms (again?): worthwhile? bc@freeuk.com (Bartc) (2011-01-26)
Re: Intermediate forms (again?): worthwhile? cr88192@hotmail.com (BGB) (2011-01-26)
| List of all articles for this month |

From: Barry Kelly <barry.j.kelly@gmail.com>
Newsgroups: comp.compilers
Date: Mon, 24 Jan 2011 10:26:53 +0000
Organization: TeraNews.com
References: 11-01-082 11-01-088 11-01-096
Keywords: LLVM
Posted-Date: 26 Jan 2011 12:15:56 EST

BGB wrote:


(Apologies ahead of time for slightly reordering your quotes.)


> also, admittedly, it doesn't sit well with my design sensibilities. I am
> personally more of a fan of modularized black-box components,


> personally I have my doubts about LLVM, mostly:
> it is a not exactly small piece of code;


If you treat it as a black box, you can largely avoid this. Most of what
one relies on in modern computing is formed out of large pieces of code.
What's more important is how large the conceptual surface area is, and
how consistent and orthogonal the component concepts are.


> it is written in C++ and doesn't provide a plain-C API (and is written
> in what appears to me to be "traditional Java-style OOP", ...), which
> may have some additional costs in some cases;


LLVM does have a C API; it's used by the OCaml binding, for example. It
is limited, but it's not too much work (in my limited experience) to
extend that C API to access more bits of the C++ API that aren't
directly exposed.


> having to get code into SSA form is not exactly an easy task.


Don't get hung up on this; LLVM converts code into SSA for you where
it's relevant. Let me explain. There are two aspects to the SSA
transformation as I see it: the simple acyclic layout of an expression
tree or DAG; and then making sure control flow etc. works out, with phi
nodes etc.


If you have a simple expression (a + b * c), and view it as a tree:


(+ a (* b c))


and then label each node in the tree:


e1:(+ e2:a e3:(* e4:b e5:c))


and then create a linear list of instructions to "compose" this tree
(i.e. a trivial post-order recursive traversal):


e2 := load("a")
e4 := load("b")
e5 := load("c")
e3 := mul_op(e4, e5)
e1 := add_op(e2, e3)


what we have here is essentially SSA form for the expression. This is
how you describe expressions to LLVM; each SSA variable is essentially a
name for a node in the expression tree (or DAG).


That's expressions out of the way; it should be clear that they're
trivial to put into SSA form. For local variables, you can allocate them
with 'alloca'. This relieves you from the requirement to transform all
references to those local variables into SSA form yourself; you can
treat them as memory cells that you poke data in and out of via
dereferencing.


The nice thing is that LLVM has an optimization pass which converts
these alloca locals into SSA for you!


I have a toy Pascal language currently compiling into LLVM. Here's a
sample procedure:


procedure R;
var
    a, b, c: Integer;
begin
    a := 10;
    b := 20;
    c := 30;
    WriteInt(a + b * c);
end;


Unoptimized, it compiles into this in LLVM assembly:


define void @R() {
entry:
    %a = alloca i32
    %b = alloca i32
    %c = alloca i32
    store i32 10, i32* %a
    store i32 20, i32* %b
    store i32 30, i32* %c
    %0 = load i32* %a
    %1 = load i32* %b
    %2 = load i32* %c
    %3 = mul i32 %1, %2
    %4 = add i32 %0, %3
    call void @WriteInt(i32 %4)
    ret void
}


Trivial, right?


Pass it through the LLVM optimizer (llvm-opt -O3), and you get this:


define void @R() {
entry:
    tail call void @WriteInt(i32 610)
    ret void
}


OK, so maybe that's more optimization than you expected to see. Less
aggressively, just the mem2reg pass:


define void @R() {
entry:
    %0 = mul i32 20, 30 ; <i32> [#uses=1]
    %1 = add i32 10, %0 ; <i32> [#uses=1]
    call void @WriteInt(i32 %1)
    ret void
}


It's really quite simple.


-- Barry


--
http://blog.barrkel.com/



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.