introduction to BNF, concrete syntax trees

"markww" <markww@gmail.com>
25 Sep 2006 01:16:28 -0400

          From comp.compilers

Related articles
introduction to BNF, concrete syntax trees markww@gmail.com (markww) (2006-09-25)
Re: introduction to BNF, concrete syntax trees mailbox@dmitry-kazakov.de (Dmitry A. Kazakov) (2006-09-25)
| List of all articles for this month |

From: "markww" <markww@gmail.com>
Newsgroups: comp.compilers
Date: 25 Sep 2006 01:16:28 -0400
Organization: Compilers Central
Keywords: parse, question

Hi,


I'm just starting to look at BNF. I have to take a simple expression
like:


          (A + B * C) * D


and create a concrete syntax tree of it. Before I start writing my
application to parse the above, I need to understand how to do this on
a piece of paper. My first thought is to tokenize (by whitespace) every
component of the statement, then label its type. I have 3 types to
choose from:


f = factor
t = term
e = expression


with the following rules:


expression ::= exp "+" exp |
                                            exp "-" exp |
                                            term


term ::= term "*" factor |
                          term "\" factor |
                          factor


factor ::= number |
                            identifier |
                            "(" expression ")"


so it must be possible to keep looping through the tokens, merging
adjacent pairs that follow the above rules until I end up with one node
right? (more than one node would indicate an error in the expression?).


So my first pass would look like this:


                          f
                          |
----------------------------------
| |
| f e f t f | t f
| | | | | | | | |
( A + B * C ) * D




now I have to pass through again to try merging nodes I suppose. So the
first thing to look at is if I can merge any adjacent terms with
factors which would give me:


                          f
                          |
----------------------------------
| t | t
| | \ | | \
| f e f t f | t f
| | | | | | | | |
( A + B * C ) * D




pass through again I guess and try merging adjacent factors and terms
again. I am a bit confused here though. The A is just sitting there
unable to merge with anyone. But it can simply itself down to a term
right? So do I just now label it a term or leave it alone?


                              ------------ t ---
                            / \
                          f \
                          | \
---------------------------------- |
| t | |
| / | | |
| / t | t
| / | \ | | \
| f e f t f | t f
| | | | | | | | |
( A + B * C ) * D


Well I guess I just keep looping until there are no factors and terms
to merge. Then I can try merging adjacent terms and expressions:




                              ------------ t ---
                            / \
                          f \
                          | \
---------------------------------- |
| | |
| - e - | |
| / \ | |
| / t | |
| | / | | |
| | / t | t
| | / | \ | | \
| f e f t f | t f
| | | | | | | | |
( A + B * C ) * D


So again I'm not sure what to do with factors like the A that just have
been sitting there this whole time. Do I just keep simplying themselves
if possible from f -> t -> e?


Any advice would be great. A simple walk through of an expression like
A + B * C would be great.


Thanks,
Mark



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.