Transparent recursive descent (Doug Merritt)
Mon, 1 Nov 1993 16:38:01 GMT

          From comp.compilers

Related articles
Transparent recursive descent (1993-11-01)
Re: Transparent recursive descent (1993-11-02)
Re: Transparent recursive descent (1993-11-04)
NON-recursive descent expression parsing (Marlin Prowell) (1993-11-04)
Re: NON-recursive descent expression parsing (1993-11-13)
Re: Transparent recursive descent (1993-11-16)
| List of all articles for this month |

Newsgroups: comp.compilers
From: (Doug Merritt)
Summary: Expression parsing made easy
Keywords: parse, code, comment
Organization: Netcom Online Communications Services
Date: Mon, 1 Nov 1993 16:38:01 GMT

In past years I've been disgruntled that the classic approach to recursive
descent parsing of arithmetic expressions is inherently complex, long, and
unintuitive compared with the conceptual simplicity of expressions.

I recently found a way around that, by writing the descent in its most
intuitive form, and avoiding ambiguity and backtracking by use of a
trivial form of memoization and operator precedence.

The resulting C code is only two core functions, comprising a few dozen
lines of code including basic error checking (roughly one printed page) to
implement the core parser, along with a number of conceptually trivial
support functions.

The code is *completely* obvious and transparent to read, which I regard
as the novel and interesting feature that may make this worthwhile to
post...correctness can almost be ascertained by casual reading. (I refrain
from claiming that it is bug-free, however. ;-) It also may be somewhat
faster than the classic approach, because fewer function calls will be
required to implement the parse.

It's terse, it's fast, it's intuitive -- I'm happy.

One or two error checks have been elided from the code below for (a)
simplicity and (b) to help make it correct for both immediate evaluation
as well as syntax tree building. It was lightly tested prior to editing
for those purposes. It does not currently handle left-associative and
non-associative operators, but those can be trivially added.

  * The following code was copyright 1993 Douglas R. Merritt.
  * I hereby release it to the public domain.
  * Transparent recursive descent expression parsing.
  * Functions included below:
  * expression() Calls expr() and checks for end-of-input.
  * expr() Parses an entire expression including operators
  * item() Parses a single operand (number or similar thing)
  * "Trivial" support functions required, but not included below:
  * is_eoi(c) Is 'c' an end-of-input token?
  * peek() Returns the next pending input character/token
  * consume() Returns value of current token, sets up for peek()
  * to read a new token. (Usually named "advance",
  * but this returns a value)
  * handle_eoi() Gets rid of all characters up to end of input
  * class_ident(c) Returns true if 'c' is identifier or constant.
  * class_unary(c) Returns true if 'c' is a unary operator
  * (e.g. -, !, ~)
  * class_binary(c) Returns true if 'c' is a binary operator
  * (e.g. +, -, *, /, etc)
  * greater_precedence(a, b)
  * Returns true if the precedence of operator b is
  * larger than that of a. (E.g. * is greater than +)
  * unary_op(op, expr)
  * Performs unary operation 'op' on expression 'expr'
  * (or builds a unary syntax tree node)
  * binary_op(expr1, op, expr2)
  * Performs binary operation 'op' on expressions
  * 'expr1' and 'expr2' (e.g. 5 * 3)
  * (or builds a binary syntax tree node)
  * error() Prints an error message.
  * Types:
  * TOKEN Either a #define for SYMBOL etc, or a single character
  * for simple things like +, *, <, >, etc.
  * VALUE If the grammar is used to evaluate constant
  * expressions, this is an "int" or a "double". Otherwise
  * this is a pointer to a node in a tree that represents
  * the operator and the one or two operands.
expression() /* This just calls expr() and handles any following garbage */
                TOKEN c;
                VALUE expr(), e;

e = expr(NULL, NULL);
                if (!is_eoi(c=peek()))
                                error("garbage after expression");
                handle_eoi(c); /* discard everything up to end of input */
                return e;

  * Parse a single operand: "(expr)", "symbol", "123", "-expr".
  * Co-recursive with expr().
                TOKEN t;
                VALUE e, expr();

                if ((t=peek()) == '(') {
                                e = expr(NULL, NULL); /* nothing memoized */
                                if (peek() == ')')
                                                error("Missing ending ')'");
                                return e;
                } else if (class_unary(t)) {
                                TOKEN oper = t;
                                e = expr(NULL, NULL); /* nothing memoized */
                                return unary_op(oper, e);
                } else if (class_ident(t)) {
                                return consume();
                error("Expected operand");
                return NULL;

expr(e1, op1)
                VALUE e1; /* The two parameters are memoization to avoid */
                TOKEN op1; /* backtracking */
                VALUE e2; /* read-ahead of a second operator and second */
                TOKEN op2; /* operand */

                if (!e1) {
                                e1 = item();
                                if (is_eoi(peek())) /* nothing else? */
                                                return e1;
                if (!op1) {
                                if (!class_binary(op1 = peek())) {
                                                error("Expected operator");
                                                return e1;
                e2 = item(); /* get second operand */
                if (class_binary(op2=peek())) {
* left- and non-associative operators could be trivally
* added here
                                if (greater_precedence(op2, op1))
                                                return binary_op(e1, op1, expr(e2, op2));
                                                return expr(binary_op(e1, op1, e2), op2);
                return binary_op(e1, op1, e2);
} /* expr */
Doug Merritt
[I can't say that this looks all that different from any other R.D.
expression parser I've seen, but it's nice to have the code for the
archives. -John]

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.