Expression Parsing (was: parsing equations)

Keith.Clarke@dcs.qmw.ac.uk
4 Apr 1996 00:23:21 -0500

          From comp.compilers

Related articles
Expression Parsing (was: parsing equations) Keith.Clarke@dcs.qmw.ac.uk (1996-04-04)
| List of all articles for this month |
From: Keith.Clarke@dcs.qmw.ac.uk
Newsgroups: comp.compilers
Date: 4 Apr 1996 00:23:21 -0500
Organization: Compilers Central
Keywords: parse

Anything to help a Mac programmer, and our esteemed moderator...


>I am looking for tools or examples to help implement a spreadsheet-like
>equation parser for an internal application we need to write. We need to
>handle arbitrary nesting and complexity... everything from
>
>A1 + B1 / C1
>
>to
>
>((0.75 + RAND() * 0.5) * AVERAGE($A3:$A19)) / (1 + LN( MAX(C3:C19) ) )
...
>Patrick Parker
>[It's certainly been done a thousand times, but it'd be nice to have a
>documented expression parser packaged up and ready to go. -John]


I have a demo expression parser written in C++, about 200 lines, but
essentially just C code. It uses the recursive descent precedence
method I posted in 1992 (sorry, I don't have a reference, but someone
reposted it in March this year).


It started as an example for teaching, & shows how to achieve various
effects that are occasionally asked for, such as juxtaposition, C-like
precedences, postfix ops, etc. It could easily handle constructs like
    expr?expr:expr
too, but I haven't bothered.


Keith Clarke
---
Code for the archive:


// This program reads an integer expression terminated by an equals sign,
// calculates its value and prints it.
// It allows spaces before each operator, number and '='.


// It uses a simple recursive precedence parser
// featuring prefix, postfix, infix operators; juxtaposition/function calls;
// left-associative infix ops; right-assoc infix ops.


// BNF
// expr = primary
// | expr ^ expr exponentiation
// | expr expr juxtaposition (subtraction, here!)
// | expr / expr division
// | expr * expr mult
// | expr + expr addition
// | expr - expr subtraction
// | expr & expr logical and
// | expr | expr logical or
// primary
// = ( expr )
// | digits
// | - expr integer negation
// | ! expr logical negation
// | expr ( expr ) division (bizarrely!)
//
// The ambiguities in the above BNF have been resolved to make the
// behaviour of this example C-like. Postfix beats prefix
// beats infix; usual C precedences except juxtaposition (same as subtraction);
// the usual function-call syntax is made to mean division, thus 100(5)(2)=10.
// I've chosen non-associative operations so it's easy to see that this code
// gives left-associative behaviour.
//
//
// There are comments in the code to show how other behaviours might be
// realised.
//
// Permission is given to use this source provided an acknowledgement is given.
// I'd also like to know if you've found it useful.


// The following Research Report describes the idea, and shows how the
// parsing method may be understood as an encoding of the usual family-of-
// parsing-procedures technique as used e.g. in Pascal compilers.
// @techreport{QMW-DCS-383-1986a,
// author ="Clarke, Keith",
// title ="The Top-Down Parsing of Expressions",
// institution ="Department of Computer Science, Queen Mary College,
University of London, England",
// year ="1986",
// month ="June",
// number ="QMW-DCS-1986-383",
// scope ="theory",
// abstractURL
="http://www.dcs.qmw.ac.uk/publications/report_abstracts/1986/383",
// keywords ="Recursive-descent parsing, expression parsing,
operator precedence parsing."
// }
// A formal proof of the algorithm was made, as part of his PhD thesis work,
// by A.M. Abbas of QMC, London, in the framework of Constructive Set Theory.
// copyright Keith Clarke, Dept of Computer Science, QMW, University of London,
// England. email keithc@dcs.qmw.ac.uk


#include <stream.h>
#include <assert.h>
#include <ctype.h>


void readprimary(int&);
void readexpr(int&,int);
void printnum(int);
void readnum(int&);
int fact(int);


// C-like highest-precedence postfix ops are handled in readprimary().
// Lower precedence postfix ops should be given a precedence in prio()
// and parsed by readexpr().
int postfix(char ch) {
        return ch=='(' || ch=='!';
}


// This function is used only if you want to use juxtaposition of two
// expressions with a precedence like a ordinary operator, such as using
// a+bc^d to mean a+(b*(c^d)) as in ordinary arithmetic.
// When an expr can be formed from one expression immediately followed
// by a phrase P, then any of the symbols in the first set of P should
// be given a precedence suitable for the binding power of the construct.
// For true juxtaposition this would be FIRST(expr); but I've made it only
// digits so I can illustrate C-like behaviour of "function calls" e.g. f(x)(y)
// in readprimary().
int firstjuxta(char ch) {
        switch(ch) {
                // case '(' : return 1;


                // return true for any digit, variable etc
                // to allow e.g. (f 3) for a function application,
                // or, in this example calculator (4 3) to mean the same as (4-3).
                default: return isdigit(ch);
        }
}


// Infix operators and maybe juxtaposition & postfix operators.
// This function tabulates the required behaviour of infix operators
// larger numbers bind more tightly.
// 0 means the symbol isn't an infix operator.
// It can be useful to give postfix ops a non-zero precedence - see readexpr()
int prio(char ch) {
        switch(ch) {
                case '|' : return 1;
                case '&' : return 2;
                case '+' : return 3;
                case '-' : return 3;
                case '*' : return 4;
                case '/' : return 4;
                case '^' : return 5;
                default:
                        if(firstjuxta(ch)) return 3;
                        else return 0;
        }
}


// Infix operators.
// True if it is, False if it isn't; doesn't matter if the symbol isn't
// even an infix op.
int rightassoc(char ch) {
        switch(ch) {
                case '^' : return 1;
                default: return 0;
        }
}


// END OF PRECEDENCE INFO


void skipspaces() {char ch; while(cin.get(ch),isspace(ch)) ; cin.putback(ch); }


int power(int n,int m) {
        int k=1;
        while(m>0) {k*=n; m--;}
        return k;
}


// Read an infix expression containing top-level operators that bind at least
// as tightly as the given precedence.
// Don't consume the first non-digit character after the last number.
// Complain if you can't even find the first number,
// or if there is an operator with no following number.
void readexpr(int& ans,int prec) {
        int number, thisprec;
        char ch;


        readprimary(ans);
        skipspaces();
        while ( cin >> ch, thisprec=prio(ch), thisprec>=prec ) {


                // We're here because prio(ch) is large enough,
                // but if firstjuxta(ch) is true, then ch is the first
                // character of the following expression, not an infix operator.
                if(firstjuxta(ch)) cin.putback(ch);


                // if ch is a symbol which is postfix not infix,
                // don't read the following expression here
                readexpr(number,rightassoc(ch)?thisprec:thisprec+1);
                switch(ch) {
                        case '+' : ans+=number; break;
                        case '*' : ans*=number; break;
                        case '-' : ans-=number; break;
                        case '/' : ans/=number; break;
                        case '^' : ans=power(ans,number); break;
                        case '0': case '1': case '2': case '3': case '4':
                        case '5': case '6': case '7': case '8': case '9':
                                // for illustration, I've made juxtaposition mean subtraction
                                // I've already made "juxtaposition expressions" only parse
                                // when the second one starts with a digit, in fn. firstjuxta()
                                ans-=number; break;
                        default: cout << "Panic: don't know the meaning of the operator "
                                                    << ch << endl;
                };
        }
        // We just read a character that wasn't an operator, and we don't
        // have any use for it - but the next input function might - put it back
        cin.putback(ch);


        // no return value
}


// read a single number; a whole expression in parentheses; prefix op
void readprimary(int& ans) {
        char ch;
        if(cin>>ch,ch=='(') {
                // must be an expression; we've consumed the opening parenthesis
                readexpr(ans,1);
                // and since readsum didn't care what followed the sum, the
                // closing parenthesis is still there waiting to be consumed
                if(cin>>ch,ch!=')') {
                        cout << "Missing right parenthesis\n";
                        // maybe we just consumed the = sign - put it back, just in case
                        cin.putback(ch);
                }
        } else if (isdigit(ch)) {
                // whoops, we've read the first digit of the number already...
                cin.putback(ch);
                readnum(ans);


        // illustrative prefix operators
        } else if (ch=='-') {
                readexpr(ans,1); ans = -ans;
        } else if (ch=='!') {
                readexpr(ans,1); ans = !ans;
        } else
                cout << "Number, left bracket or prefix operator expected\n";


        // Parsing postfix ops here gives them HIGHER precedence than any
        // infix operation; HIGHER precedence than prefix operators.
        // You can give a postfix op a lower prec
        // by handling it as though it were infix - see readexpr()


        // illustrative postfix operators
        while(cin>>ch,postfix(ch)) {
                if(ch=='!') {
                        ans=fact(ans);
                } else if(ch=='(') {
                        // make this "function call" syntax mean division
                        int number;
                        readexpr(number,1);
                        ans/=number;
                        if(cin.get(ch),ch!=')') {
                                cout << "Missing right parenthesis in function call\n";
                        }
                } else
                        cout << "Panic: don't know the meaning of postfix op "
                                  << ch << endl;
        }
        cin.putback(ch);
}


// print an integer in decimal, one char at a time
// For pedagogic purposes only!
void printnum(const int N) {
        if(N<10) { cout << N%10; }
        else {
                printnum(N/10);
                cout << N%10;
        }
}


int digitvalue(char ch) {
        return (ch - '0');
}


// Each function checks all of its phrase, even if the caller has already
// made sure the first symbol is OK
void readnum(int& N) {
        char ch;
        N=0;
        while(cin.get(ch),isspace(ch)) ;
        if(!isdigit(ch)) {
                cout << "Number expected\n";
                return;
        }
        while(isdigit(ch)) {
                N = N*10+digitvalue(ch);
                cin.get(ch);
        }
        cin.putback(ch);
}


int fact(int n) {
        return n==0?1:n*fact(n-1);
}


main() {
        int total; char ch;
        cout << "Type an integer expression terminated by an '=' character";


        // No doubt there's a better way to do this, but who cares...
        while(cout << "\n? ",cin.get(ch),cin.putback(ch),!cin.eof()) {
                readexpr(total,1);
                skipspaces();
                cin >> ch;
                if(ch=='=') {
                        printnum(total);
                } else {
                        cout << "Expected an '=' character after the last number";
                }
                while(cin.get(ch),ch!='\n') ;
        }
}
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.