Best way to parse simple statement grammar

"ed_davis2" <ed_davis2@yahoo.com>
4 Jun 2005 15:13:39 -0400

          From comp.compilers

Related articles
Best way to parse simple statement grammar ed_davis2@yahoo.com (ed_davis2) (2005-06-04)
Re: Best way to parse simple statement grammar DrDiettrich@compuserve.de (Hans-Peter Diettrich) (2005-06-08)
| List of all articles for this month |

From: "ed_davis2" <ed_davis2@yahoo.com>
Newsgroups: comp.compilers
Date: 4 Jun 2005 15:13:39 -0400
Organization: http://groups.google.com
Keywords: parse, question

I'm trying to figure out which is the best way to parse the following
simple grammar, using a hand-written parser:


stmtseq = {stmt}
stmt = "if" expr "then" stmtseq ["else" stmtseq "endif"]
                | "while" expr "do" stmtseq "endwhile"
                | "repeat" stmtseq "until" expr


Perusing available literature and sources of compilers, I have seen
basically three styles (for simplicity, eof processing is ignored):


1) stmtseq loops until any token in stmt's follow-set is found:


sub stmtseq()
    while not (token in [t_endwhile, t_until, t_else, t_endif]) do
        if is_token(t_while) // consumes token when true
            expr()
            expect(t_do)
            stmtseq()
            expect(t_endwhile)
        elseif is_token(t_repeat)
            stmtseq()
            expect(t_until)
            expr()
        elseif is_token(t_if)
            expr()
            expect(t_then)
            stmtseq()
            if is_token(t_else)
                stmtseq()
            endif
            expect(t_endif)
        else // no matching stmt found
            error()
        endif
    endwhile
endsub


2) stmtseq loops until a token that isn't in stmt's first_set is
found:


sub stmtseq() {
    do forever
        if is_token(t_while) // consumes token when true
            expr()
            expect(t_do)
            stmtseq()
            expect(t_endwhile)
        elseif is_token(t_repeat)
            stmtseq()
            expect(t_until)
            expr()
        elseif is_token(t_if)
            expr()
            expect(t_then)
            stmtseq()
            if is_token(t_else)
                stmtseq()
            endif
            expect(t_endif)
        else // no matching stmt found, so terminate loop
            break
        endif
    enddo
endsub


3) Each statement searches for its own specific follow-set token:


sub stmtseq(end_tokens) {
    while not (token in end_tokens) do
        if is_token(t_while) // consumes token when true
            expr()
            expect(t_do)
            stmtseq(t_endwhile)
            accept(t_endwhile)
        elseif is_token(t_repeat)
            stmtseq(t_until)
            accept(t_until)
            expr()
        elseif is_token(t_if)
            expr()
            expect(t_then)
            stmtseq([t_else, t_endif])
            if is_token(t_else)
                stmtseq(t_endif)
            endif
            accept(t_endif)
        else // no matching stmt found
            error()
        endif
    endwhile
endsub


Which one of the above is best, and/or is there a better way, using a
hand-written parser?


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.