Re: C symbol table structures (Kenn Heinrich)
6 Jan 1999 04:07:21 -0500

          From comp.compilers

Related articles
C symbol table structures (1999-01-02)
Re: C symbol table structures (Basim Kadhim) (1999-01-04)
Re: C symbol table structures (1999-01-06)
Re: C symbol table structures (Dave Hanson) (1999-01-11)
Re: C symbol table structures (Gene Ressler) (1999-01-17)
| List of all articles for this month |

From: (Kenn Heinrich)
Newsgroups: comp.compilers
Date: 6 Jan 1999 04:07:21 -0500
Organization: none
References: 99-01-005 99-01-019
Keywords: C, design

Alex Colvin wrote:
>> The parser keeps a stack of the currently active scopes. To find an
>> identifier's current definition we search its definition list for the
>> first (innermost) active definition.
>> Structures have their own scopes, which are searched to find member
>> definitions.
>> In this scheme, scope entry and exit just increment the scope number
>> and adjust the scope number stack. Instead of having a table for each
>> scope, we have a list for each identifier. This is sort of like shallow
>> binding.
>> The idea, as I recall, is that C has lots of scopes, but most
>> identifiers are defined in only one, typically the global level.
>> The drawback is that heavily overloaded identifiers have slow lookups,
>> potentially scanning past lots of out-of-scope definitions. Even this
>> can probably be fixed by discarding them upon redefinition.

The technique I used in a Pascal compiler was very similar, but did
not slow down overloaded identifiers. The "hash table" and "symbol
scoping" concepts were interwoven, giving a hash table that always had
the current binding (local scope) immediately accesible.

Imagine your standard hash table; an array of buckets, each bucket
containing a linked list of (identifiers, scope, attribute) pairs.
Now replace each pair with a stack (or equivalently, linked list) of
pairs. Every time you open a scope, increment the scope counter. If
an identifier is redefined, you put it on the *top* of the stack,
hiding the old binding, because the hash traversal (lookup) only looks
at the top of the stack (or head of the list). The hash lookup is not
slowed becase only the current binding is directly in the search list
from the bucket.

Furthermore, each and every entry in the symbol table is threaded onto
the head of a singly-linked list. Since identifiers are entered in the
order they are parsed, the innermost bindings are at the head of this
list while parsing the innermost scope. To close a scope, just
traverse this list from the head, deleting the symbols, until you
reach the end or an identifier that has a lower scope than the one
you're closing.

Insertion and lookup of a symbol is linear time w.r.t. the length of
the hash bucket's search list, but independent of scope. Closing a
scope is linear time w.r.t. the number of symbols in the
scope. Opening a scope is free, (it only requires incrementing the
scope pointer).

I implemented this using linked lists,

typedef struct Sym { /* in hash table */
    char *name; /* ascii name */
    int scope;
    int recnum; /* 1 = normal identifier, 2 and up for records */
    SymRef next; /* doubly linked list fwd pointer */
    SymRef last; /* doubly linked list back pointer */
    SymRef link; /* every single entry isthreaded onto a linked list */
    SymRef *root; /* address of slot in hash table array, for d-link list */
    SymRef list; /* linked list to old (hidden) definitions */
    struct Attributes *Attributes;
} Sym;

Kenn Heinrich

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.