Implementing scripting language: nested scopes and other problems

Tero Laitinen <tero.laitinen@audioriders.fi>
17 Mar 2003 00:08:37 -0500

          From comp.compilers

Related articles
Implementing scripting language: nested scopes and other problems tero.laitinen@audioriders.fi (Tero Laitinen) (2003-03-17)
Re: Implementing scripting language: nested scopes and other problems cdodd@acm.org (Chris Dodd) (2003-03-22)
| List of all articles for this month |
From: Tero Laitinen <tero.laitinen@audioriders.fi>
Newsgroups: comp.compilers
Date: 17 Mar 2003 00:08:37 -0500
Organization: Elisa Internet customer
Keywords: symbols, design
Posted-Date: 17 Mar 2003 00:08:37 EST

I'm developing a scripting language which has borrowed parts of syntax
from Python, C and Java. Just for fun, of course. No real use.


I have already generated a working lexical scanner using flex. The
parser still has some things that should be figured out.


One thing is nested scopes. Following code describes the problem:


int num = 3


class MyClass
          int num = 3
          int GetNum()
                  return num


MyClass's private variable "num" should not affect global variable
"num" and vice versa. I know in theory how nested scopes should be
implemented; pushing stack when parsing inner block so that symbols do
not exists in same namespace. But how to implement it... ?


Should I keep variables as strings in parsing process and after
intermediate code has been generated, assign symbols for strings. Or
is it easy to add multiple symbol tables and namespaces in bison? IMO,
the latter would be more sensible approach. Currently, I have simple
hash table that contains symbol structs.


The other problem is almost about the same thing.
Consider following code:


int a


class Storage
          int a


class BigStorage
          Storage st1, st2


BigStorage st = new BigStorage()
a = 3
st.st1.a = a


Intermediate code tree looks currently like this:
STMT_LIST
      STMT_LIST
          STMT_LIST
              STMT_LIST
                  STMT_LIST
                      STMT_LIST
                          STMT_LIST
                              STMT_LIST
                                  STMT_LIST
                                      VAR_DEC
                                          TYPE_STMT int
                                          ID_LIST a (line 1)
                              CLASS_DEC Storage (line 2)
                                  CLASS_CONTENT_LIST
                                      VAR_DEC
                                          TYPE_STMT int
                                          ID_LIST a (line 4)
                      CLASS_DEC BigStorage (line 5)
                          CLASS_CONTENT_LIST
                              VAR_DEC
                                  TYPE_STMT Storage (line 2)
                                  ID_LIST
                                      ID_LIST st1 (line 6)
                                      ID_LIST st2 (line 6)
              VAR_DEC
                  TYPE_STMT
                  ID_LIST st (line 8)
                      SET_STMT st (line 8)
                          NEW_EXPR BigStorage (line 5)
          EXPR_STMT
              SET_STMT
                  SIMPLE_VAR a (line 1)
                  INTEGER_EXPR 3
      EXPR_STMT
          SET_STMT
              FIELD_VAR_EXPR
                  SIMPLE_VAR st (line 8)
                  FIELD_VAR
                      SIMPLE_VAR st1 (line 6)
                      SIMPLE_VAR a (line 1)
              FIELD_VAR_EXPR
                  SIMPLE_VAR a (line 1)


But those symbols do not get assigned properly.
"st.st1.a" has nothing to do with global variable "a"


Grammar rules for parsing operator . and operator [ ]
cause 2 shift/reduce conflicts


lvalue:
                  var_id
                  {
                                  $$ = create_node(SIMPLE_VAR); $$->sym = $1;
                  }
                  | lvalue T_INDEX lvalue
                  {
                                  $$ = create_node2(FIELD_VAR, $1, $3);
                  }
                  | lvalue T_LBRACKET exp T_RBRACKET
                  {
                                  $$ = create_node2(INDEX_VAR, $1, $3);
                  }
;
var_id:
                  T_ID
                  {
                                $$ = find_symbol( yylval.string, SYM_VAR);
                                if ($$ == NULL)
                                                  $$ = create_symbol(yylval.string,
                                                                        SYM_VAR, _lineno);
                  }
;


T_INDEX = .
T_LBRACKET = [
T_RBRACKET = ]
T_ID = some string


exp = any kind of expression, for example (3-a)*2


So, what's the proper way to process expressions like
a.b.c and a[2][3]


If you have any idea, I would appreciate your help.
Thank you in advance.


Best regards,


Tero Laitinen


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.