ANN: TXR 191: Now with compiler.

Kaz Kylheku <157-073-9834@kylheku.com>
Thu, 12 Apr 2018 00:37:27 +0000 (UTC)

          From comp.compilers

Related articles
ANN: TXR 191: Now with compiler. 157-073-9834@kylheku.com (Kaz Kylheku) (2018-04-12)
| List of all articles for this month |

From: Kaz Kylheku <157-073-9834@kylheku.com>
Newsgroups: comp.compilers
Date: Thu, 12 Apr 2018 00:37:27 +0000 (UTC)
Organization: Aioe.org NNTP Server
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="55792"; mail-complaints-to="abuse@iecc.com"
Keywords: available, Lisp
Posted-Date: 11 Apr 2018 20:48:31 EDT

http://nongnu.org/txr


http://www.kylheku.com/cgit/txr


I'm pleased to announce that after 190 public releases of the TXR
language over the past 8.5 years, there is finally a compiler for the
TXR Lisp dialect, which produces code for a virtual machine.


TXR's Lisp library is now compiled, along with the compiler and assembler,
making it self-hosting.


The virtual machine is register-based rather than a stack machine, and
uses displays for representing nested environments. Display frames are
allocated on the native stack. When lexical closures are created with
the "close" instruction, frames are relocated to the heap,
transparently to the code.


Non-local control transfers and exception handling all works, as do
delimited continuations: some minor fixing was required to integrate
the VM with the existing native-stack-capture-based delimited
continuation implementation.


Examples:


compile-toplevel:


    Compile a form in a null lexical enviroment. Then invoke the resulting
    VM description as if it were a nullary function (though it is not of
    function type).


        1> (compile-toplevel '(list (succ x) (succ y)))
        ** warning: (expr-1:1) unbound variable x
        ** warning: (expr-1:1) unbound variable y
        #<sys:vm-desc: 849c150>
        2> (disassemble *1)
        data:
            d00: x
            d01: y
        funs:
                0: list
                1: succ
        code:
                0: 79000003 getv t03 d00
                1: 24010002 gcall t02 1 t03
                2: 00030001
                3: 79010004 getv t04 d01
                4: 24010003 gcall t03 1 t04
                5: 00040001
                6: 24020001 gcall t01 0 t02 t03
                7: 00020000
                8: 00000003
                9: 10000001 end t01
        instruction count:
                6
        #<sys:vm-desc: 849c150>
        3> (progn (defvar x 1) (defvar y 2))
        y
        4> (call *1)
        (2 3)


compile:


    Compile a function by name, or function object:


        1> (defun add (x y) (+ x y))
        add
        2> (compile 'add)
        #<vm fun: 2 param>
        3> (disassemble 'add)
        data:
            d00: add
        funs:
                0: +
        code:
                0: 9C00000B close t01 2 11 2 2 nil v0000 v0001
                1: 00020001
                2: 00020002
                3: 02010200
                4: 5C00000A block t02 d00 10
                5: 00020100
                6: 24020002 gcall t02 0 v0000 v0001
                7: 02000000
                8: 00000201
                9: 10000002 end t02
              10: 10000002 end t02
              11: 10000001 end t01
        instruction count:
                6
        entry point:
                3
        add


    Here, what was compiled is a (lambda (x y) (+ x y)) form which was then
    stored back into the add function binding. We get here the disassembly
    of that entire form; the entry point of the function itself is at
    offset 3.


    The compile function first compiled the lambda form and then invoked
    it, to obtain the function out of it. This was done by the close
    instruction. The close instruction created the closure, put it
    into register t01, and then branched to the instruction at 11 which
    terminates the machine, indicating t01 as the result value.


    At offset 3 is the map of argument locations v0001 and v0001, followed
    by the code of the function body.


    The gcall instruction is a variant of the function call which works
    with a table of cached global function bindings. It's invoking the
    function at index 0 (+), with args v0001 and v0002 (the function's
    own arguments).


compile-file: compile a Lisp file.


    If we put the add function into a file add.tl, and invoke
    (compile-file "add"), we get an add.tlo file containing this:


        (0 0)
        ((3 4 #b'0b00009c02000200 0200020000020102 0a00005c00010300 0300022401000002
                          0102000003000010 0300001001000224 0000000102000000 01000010'
            #(usr:add) #(sys:rt-defun usr:+)))


    The first expression gives major-minor version information.
    Then the compiled image follows: a list of compiled top-level forms.
    There is only one. The 3 4 numbers indicate number of registers and
    levels. Then there is a buffer object containing the virtual machine code.
    Then two vectors: the literal data vector and function table.


    The representation here is a little different because an entire
    (defun ...) form was compiled, not just a lambda. We can load it
    and disassemble it:


        1> (load "add")
        nil
        2> (disassemble 'add)
        data:
            d00: add
        funs:
                0: sys:rt-defun
                1: +
        code:
                0: 9C00000B close t02 2 11 2 2 nil v0000 v0001
                1: 00020002
                2: 00020002
                3: 02010200
                4: 5C00000A block t03 d00 10
                5: 00030100
                6: 24020003 gcall t03 1 v0000 v0001
                7: 02000001
                8: 00000201
                9: 10000003 end t03
              10: 10000003 end t03
              11: 24020001 gcall t01 0 d00 t02
              12: 01000000
              13: 00000002
              14: 10000001 end t01
        instruction count:
                7
        entry point:
                3
        add


    The situation is similar in that the form's logic begins by
    taking a closure, this time into register t02. It branches to
    11 like before. Here, function 0 is called, which is now sys:rt-defun,
    the run-time support routine for a compiled defun.
    Its arguments are d00 t02: the symbol add and the closure.
    The effect, of course, is that the function is defined.
    The return value of sys:rt-defun is captured into register t01
    whose value is returned. (That value is the function name.)


    Again, the disassembly listing for add shows this whole thing,
    indicating add's specific entry point into it.


So there we have a kind of high level walkthrough of what has been brewing in
TXR development lately.


I must pat myself on the back: good four weeks of spare-time work. I'm taking
a bit of a break now from the late nights and early mornings.


--
TXR Programming Lanuage: http://nongnu.org/txr
Music DIY Mailing List: http://www.kylheku.com/diy
ADA MP-1 Mailing List: http://www.kylheku.com/mp1


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.