Re: Best way to parse simple statement grammar

Hans-Peter Diettrich <DrDiettrich@compuserve.de>
8 Jun 2005 15:57:53 -0400

          From comp.compilers

Related articles
Best way to parse simple statement grammar ed_davis2@yahoo.com (ed_davis2) (2005-06-04)
Re: Best way to parse simple statement grammar DrDiettrich@compuserve.de (Hans-Peter Diettrich) (2005-06-08)
| List of all articles for this month |

From: Hans-Peter Diettrich <DrDiettrich@compuserve.de>
Newsgroups: comp.compilers
Date: 8 Jun 2005 15:57:53 -0400
Organization: Compilers Central
References: 05-06-030
Keywords: parse, practice
Posted-Date: 08 Jun 2005 15:57:53 EDT

ed_davis2 wrote:
>
> I'm trying to figure out which is the best way to parse the following
> simple grammar, using a hand-written parser:
>
> stmtseq = {stmt}
> stmt = "if" expr "then" stmtseq ["else" stmtseq "endif"]
> | "while" expr "do" stmtseq "endwhile"
> | "repeat" stmtseq "until" expr
>
> Perusing available literature and sources of compilers, I have seen
> basically three styles (for simplicity, eof processing is ignored):
>
> 1) stmtseq loops until any token in stmt's follow-set is found:
...
> 2) stmtseq loops until a token that isn't in stmt's first_set is
> found:
...
> 3) Each statement searches for its own specific follow-set token:
...
> Which one of the above is best, and/or is there a better way, using a
> hand-written parser?


The best choice for hand-written parsers is (2), because it doesn't
require the (manual) determination of the follow sets. Hard-coded
follow sets, as used in (1), are a maintenance nightmare, consider
what will happen if you ever modify the grammar :-(


If you are free in the design of your grammar(s), you can use some
existing tool for the verification that the grammar is LL(1). Then you
can be sure that the first and follow sets are disjoint, and you are
free to write your parser as you like. You'll also need such a tool
for the calculation of the first and follow sets, if you ever intend
to use such sets in your code.


If the first and follow sets (of stmtseq...) are not disjoint, then
the grammar is not LL(1), and you may have to specify how an according
ambiguity should be solved. Typically a parser then will provide the
longest match, invalidating solution (1) - see the "dangling else"
problem.




IMO solution (2) is the most appropriate way to handcraft an top-down
parser. This approach makes it safe and easy, to find out which part
of the code is affected by a change to the grammar. In real code you
should use an switch instead of an else-if chain; a switch statement
is easier to maintain, in detail when the sub-rules can start with
more than a single specific token, and the compiler also will warn you
when the same token occurs more than once in the case list.




If you are really unexperienced in writing parsers, I strongly suggest
that you use an parser generator first, at least for the computation of
the first and follow sets, and to obtain at least a skeleton for an
correct parser.


DoDi


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.