Re: A handful of LISP questions

Gene <gene.ressler@gmail.com>
Tue, 19 Jun 2007 02:36:47 -0000

          From comp.compilers

Related articles
A handful of LISP questions tactics40@gmail.com (tactics) (2007-06-19)
Re: A handful of LISP questions gene.ressler@gmail.com (Gene) (2007-06-19)
Re: A handful of LISP questions nmh@t3x.org (Nils M Holm) (2007-06-19)
Re: A handful of LISP questions mcr@nerdware.org (Martin Rodgers) (2007-06-19)
Re: A handful of LISP questions mal@wyrd.be (Lieven Marchand) (2007-06-19)
Re: A handful of LISP questions ahz@snafu.de (Andreas Hinze) (2007-06-19)
Re: A handful of LISP questions tactics40@gmail.com (tactics) (2007-06-20)
Re: A handful of LISP questions gene.ressler@gmail.com (Gene) (2007-06-20)
| List of all articles for this month |

From: Gene <gene.ressler@gmail.com>
Newsgroups: comp.compilers
Date: Tue, 19 Jun 2007 02:36:47 -0000
Organization: Compilers Central
References: 07-06-033
Keywords: Lisp, parse
Posted-Date: 19 Jun 2007 10:24:35 EDT

On Jun 18, 8:29 pm, tactics <tactic...@gmail.com> wrote:
> Hello all~
>
> Recently, I have become interested writing a personal dialect of LISP
> in C. Over the last month, I actually managed to get fairly stable
> version working and I wrote a small word-guessing game in it and some
> common recursive functions (eg, I got it to print the primes between 2
> and 100). It was really fun to do, and I want to take it a step
> further. However, I have some questions and advice to ask here.


Yes. It's a great exercise.


> My first question, and the most important to me for now, I think, is
> what is the best way to write a parser for a LISP? It was only out of
> the grace of the C Gods that I got my current parser working. I have a
> nice method for breaking a raw string into tokens (which I was cute
> about, and instead of returning an array of tokens, it returns a
> cons'ed list of C structure LISP-objects).


For generality you're better off pulling tokens from the start of a
stream. The reader can need to handle very big data in a serious lisp
implementaiton.


> But once I have the tokens,
> I do some pretty bad black magic to get the final list structure. It's
> something like, whenever I come to a parenthesis, I search for the
> close, split the string, and then parse the stuff inside the paren and
> then the stuff outside the paren. It's really bad!


> I was wondering from a theoretical standpoint, what kind of grammar is
> a LISP grammar? Obviously, it's a straightforward CFG, but is it
> LL(1)? I took a class in Programming Languages in college, but nothing
> more than how to write a calculator in JavaCC. The web resources for
> LL and LR parsing methods are quiet pathetic too (someone needs to
> clean up their LL algorithm explanation really badly). Can anyone
> point me in the right direction here? What would be a good technique
> to look into for this job?


You might enjoy looking at xlisp. http://almy.us/xlisp.html . It's
pretty much what you are trying to do.
You can learn something about compiling a small lisp with
http://hedgehog.oliotalo.fi/


Lisp grammar is trivially LL(1). Every lisp reader I've ever seen
parses by recursive descent. The lexer returns atoms, parens, and
dots. The parser is something like this:


function read
    next_token = lex;
    if next_token = '(' then
        return read_list
    else if next_token = ')' then
        return ')' ;; only "legal" when returned to read_list
    else // atom
        return next_token;
end read


function read_list
    list = tail = nil
    loop
        next_atom = read;
        if next_atom = ')' return list
        if next_atom = '.'
            tail->cdr = read;
            if read /= ')' error "malformed dotted list"
            return list;
        end if
        ;; add new item to tail of list
        if tail = nil
            list = tail = cons(next_atom, nil);
        else
            tail->cdr = cons(next_atom, nil)
            tail = tail->cdr
        end if
    end loop;
end read_list


In some lisps, the "read_list" above is implemented as a macro
character function for the '(' character.


>
> My next question is about garbage collection. I have a nice mark and
> sweep algorithm written, but the big issue here is I don't know when
> to actually DO garbage collection. Right now, I just call it at the
> end of my program, but that is hardly useful. Do I just run it
> periodically in a separate thread? Do I set a recurring signal timer?
> In either case, what is the best way to ensure I still have access to
> the current environment?


The usual idea is to allocate an "arena", allocate from that until
it's full, thebn garbage collect. If the arena is still full after
collection, increase its size.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.