Re: Parsing Expressions.

Scott Stanchfield <thetick@scruz.net>
30 Jun 1997 22:50:05 -0400

          From comp.compilers

Related articles
Parsing Expressions. storys@ma.ultranet.com (1997-06-24)
Re: Parsing Expressions. thetick@scruz.net (Scott Stanchfield) (1997-06-30)
| List of all articles for this month |

From: Scott Stanchfield <thetick@scruz.net>
Newsgroups: comp.compilers
Date: 30 Jun 1997 22:50:05 -0400
Organization: scruz-net
References: 97-06-096
Keywords: parse

Sorry to get picky...


A "parse tree" is basically an abstract concept -- it represents how a
source file was parsed, stating which rules in a grammar were used and
how the terminals (which represent "tokens", or groups of characters
in the source file) are grouped using those rules.


You can store all this information, but it's usually overkill.
(Unless you're writing a parser debugger like I have -- see my .sig)


The more-often used approach is an Abstract Syntax Tree (AST). In an
AST, each node represents a _terminal_ in the source file. Not all
terminals need to be represented, and sometimes "imaginary" nodes are
inserted to help grouping. (For example, an imaginary
"statement-list" node could be the parent of several statement
children. When I say imaginary, I mean that there is no real token in
the source file that it represents -- it's just a made-up token for
grouping.)


A simple AST might look like (in LISP-ish (parent child1 child2...)
notation)


(function f
        (param-list
                (var x int)
                (var y int))
        (statement-list
                (assign
                        (lhs (var x))
                        (rhs
                                (plus
                                        (mult
                                                (var x)
                                                (var y))
                                        (div
                                                (var q)
                                                (var w)))))
                (call
                        (function print)
                        (param-list
                                (var x))))


for a fictional programming language source


    function f(int x, int y) {
            x = x*y+q/w;
            print(x);
    }


Depending on what you store in each node, the extra dummy nodes like
"var" may not be necessary.


You may want to look at ANTLR 2.0 (formerly PCCTS). ANTLR 2.0 is
written in and generates Java source (PCCTS 1.33 is still available to
generate C or C++ code -- someone will write a C++ generator for ANTLR
2.0 soon I expect.)


ANTLR has some very nice features for automatic AST generation, and you
can generate tree walkers using grammar notations that are nearly
identical to a "normal" text-parsing grammar.


Have a look at
        http://java.magelang.com/antlr
for details, and see my .sig for a pointer to info on the Visual
Debugger I'm writing for it.


Take care!
Scott


> I am attempting to write a simple byte code BASIC compiler.
...
> [Most of us use parse trees. -John]


--
Scott Stanchfield Santa Cruz,CA
Get some info on ParseView - my Visual ANTLR debugger at
        http://www.scruz.net/~thetick/parseview
Visit my PCCTS Tutorial at http://www.scruz.net/~thetick/pcctstut
See my cute baby girl at http://www.scruz.net/~thetick/claire




--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.