From: | George Neuner <gneuner2@comcast.net> |
Newsgroups: | comp.compilers |
Date: | Sat, 19 May 2007 16:17:59 -0400 |
Organization: | Compilers Central |
References: | 07-05-067 |
Keywords: | parse, symbols |
Posted-Date: | 20 May 2007 22:07:32 EDT |
On Thu, 17 May 2007 21:43:38 -0700, Ryan Dary <iecc@ryandary.com>
wrote:
>Take a look at this source code, ...
>
>Sub do_something_else( byRef i As Integer )
> i = i * 10
>End Sub
>
>Function do_something( i As Integer ) As Integer
> Dim a As Integer = i + 10
> do_something_else i
> Return a * 5
>End Function
>
>My understanding of the syntax tree is that I'm not supposed to be
>worried about the "meaning" of the code, but rather the "structure" of
>the code. So, I'm not building a symbol tree at this phase... the
>problem with that seems to be that I'm unable to make heads or tails
>of the lines of code within the function declaration. For instance,
>as I parse the Dim statement (which is used to declare a variable), I
>am able to parse the components "Dim a As Integer = <exp>" where the
><exp> (expression) seems to be impossible to really parse without
>having a symbol table thus far in the parsing. I wouldn't know if "i"
>is a variable or a function or a constant, because I don't have any
>way of looking it up in a symbol table. So, should I be building the
>symbol table as I'm parsing the syntax tree from the tokens?
Strictly speaking, it is never *necessary* to build a symbol table
while parsing - it can always be done by a separate pass (or passes)
through the syntax tree. However, as a practical matter, there are
languages like C for which parsing is very difficult to perform
without the symbol information - it can be done but the result is very
messy and the work to disambiguate everything is more trouble than it
is worth. Regardless of the language, constructing the symbol table
as early as possible saves work later.
That said ... a VB like syntax is easy enough that you really could
leave symbol processing until later if you wanted to. While parsing
you only need to know the context in which the identifier is used -
declaration, expression, function argument, etc. You structure your
syntax tree to record the context but you leave the identifier
references generic.
Assuming your parser recognizes keywords for built-in types - and
without going into too much detail - your syntax tree might initially
look something like:
(program
(proc
( (id "do_something_else")
(params (id "i" (type (ref integer)))) )
(code
(expr (op =
(expr (id "i"))
(expr (op * (id "i") (const 10))))) ))
(func
( (id "do_something")
(type integer)
(params (id "i" (type integer)) )
(vars
(id "a"
(type array
(type integer)
(dim (expr (op + (id "i") (const 10)))) )))
(code
(call (id "do_something_else")
(args (expr (id "i"))))
(return (expr (op * (id "a") (const 5)))) ))
:
)
Since user types like records are constructed from built-in types or
recursively from previously declared user types, it is sufficient for
the parser to recognize the built-ins.
Then you walk the syntax tree, constructing your type and symbol
tables(s) as you determine what the identifiers mean. During this
pass you can check the code for undeclared and duplicate identifiers,
unfinished forward declarations, etc. If the language syntax requires
all symbols be declared before use then you can also perform some
usage checking in this pass. If your language allows identifiers to
be referenced before they are declared (e.g., function names) then
you'll have to wait until the entire symbol table is constructed
before checking usage.
Once your types and symbols are defined, you can rewrite the code
parts of the tree so the generic ID references point directly to the
symbol table entries.
Hope this helps.
George
Return to the
comp.compilers page.
Search the
comp.compilers archives again.