Best way to parse simple statement grammar

"ed_davis2" <ed_davis2@yahoo.com>
4 Jun 2005 15:13:39 -0400

From comp.compilers

Related articles
*Best way to parse simple statement grammar ed_davis2@yahoo.com (ed_davis2)* (2005-06-04)**
Re: Best way to parse simple statement grammar DrDiettrich@compuserve.de (Hans-Peter Diettrich) (2005-06-08)

| List of all articles for this month |

From:	"ed_davis2" <ed_davis2@yahoo.com>
Newsgroups:	comp.compilers
Date:	4 Jun 2005 15:13:39 -0400
Organization:	http://groups.google.com
Keywords:	parse, question
Posted-Date:	04 Jun 2005 15:13:39 EDT

I'm trying to figure out which is the best way to parse the following
simple grammar, using a hand-written parser:

stmtseq = {stmt}
stmt = "if" expr "then" stmtseq ["else" stmtseq "endif"]
                | "while" expr "do" stmtseq "endwhile"
                | "repeat" stmtseq "until" expr

Perusing available literature and sources of compilers, I have seen
basically three styles (for simplicity, eof processing is ignored):

1) stmtseq loops until any token in stmt's follow-set is found:

sub stmtseq()
    while not (token in [t_endwhile, t_until, t_else, t_endif]) do
        if is_token(t_while) // consumes token when true
            expr()
            expect(t_do)
            stmtseq()
            expect(t_endwhile)
        elseif is_token(t_repeat)
            stmtseq()
            expect(t_until)
            expr()
        elseif is_token(t_if)
            expr()
            expect(t_then)
            stmtseq()
            if is_token(t_else)
                stmtseq()
            endif
            expect(t_endif)
        else // no matching stmt found
            error()
        endif
    endwhile
endsub

2) stmtseq loops until a token that isn't in stmt's first_set is
found:

sub stmtseq() {
    do forever
        if is_token(t_while) // consumes token when true
            expr()
            expect(t_do)
            stmtseq()
            expect(t_endwhile)
        elseif is_token(t_repeat)
            stmtseq()
            expect(t_until)
            expr()
        elseif is_token(t_if)
            expr()
            expect(t_then)
            stmtseq()
            if is_token(t_else)
                stmtseq()
            endif
            expect(t_endif)
        else // no matching stmt found, so terminate loop
            break
        endif
    enddo
endsub

3) Each statement searches for its own specific follow-set token:

sub stmtseq(end_tokens) {
    while not (token in end_tokens) do
        if is_token(t_while) // consumes token when true
            expr()
            expect(t_do)
            stmtseq(t_endwhile)
            accept(t_endwhile)
        elseif is_token(t_repeat)
            stmtseq(t_until)
            accept(t_until)
            expr()
        elseif is_token(t_if)
            expr()
            expect(t_then)
            stmtseq([t_else, t_endif])
            if is_token(t_else)
                stmtseq(t_endif)
            endif
            accept(t_endif)
        else // no matching stmt found
            error()
        endif
    endwhile
endsub

Which one of the above is best, and/or is there a better way, using a
hand-written parser?

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Best way to parse simple statement grammar

"ed_davis2" <ed_davis2@yahoo.com>4 Jun 2005 15:13:39 -0400

"ed_davis2" <ed_davis2@yahoo.com>
4 Jun 2005 15:13:39 -0400