Re: Are Associative Arrays unique to Perl?

mjd@plover.com (Mark-Jason Dominus)
20 Oct 1996 16:51:15 -0400

          From comp.compilers

Related articles
Re: Are Associative Arrays unique to Perl? mjd@plover.com (1996-10-20)
Re: Are Associative Arrays unique to Perl? andy@research.canon.com.au (1996-10-24)
Re: Are Associative Arrays unique to Perl? mzenier@netcom.com (1996-10-24)
Re: Are Associative Arrays unique to Perl? leichter@smarts.com (Jerry Leichter) (1996-10-24)
Re: Are Associative Arrays unique to Perl? ok@cs.rmit.edu.au (1996-10-24)
Re: Are Associative Arrays unique to Perl? hbaker@netcom.com (1996-10-25)
Re: Are Associative Arrays unique to Perl? phr@netcom.com (1996-10-30)
[6 later articles]
| List of all articles for this month |

From: mjd@plover.com (Mark-Jason Dominus)
Newsgroups: comp.lang.perl.misc,comp.lang.misc,comp.compilers
Date: 20 Oct 1996 16:51:15 -0400
Organization: Plover Systems, Philadelphia, PA.
References: <5437ev$30u@shell1.aimnet.com> <545mqn$qul@picasso.op.net>
Keywords: design, history

Mark-Jason Dominus <mjd@plover.com> wrote:
>No; they first appeared in SNOBOL4 about twenty-five years ago.


Sorry to keep blabbing about this; it's off-topic for most of these
groups. But it's so interesting I can't shut up. The implementation
notes for SNOBOL4, version 3, say that associative arrays are *not*
implemented as hash tables, as you might expect they would be. The
true story is quite surprising and possibly of interest to compiler
writers and programmers generally, and I have the privilege of
explaining it now.


Each string in a SNOBOL4 program is unique. If you write


X = 'FOO'
Y = 'F' 'O' 'O'


then X and Y both end up containing pointers to the same place. At
run time, when SNOBOL needs to assign the string FOO to variable Y, it
concatenates the 'F', 'O', and 'O', and then discovers that the
result, `FOO', is already held by X. It then makes Y point to the
same FOO that X was pointing to.


The strings, such as FOO, *are* held in a large hash table. The
elements of this hash table are structures which include the string, a
flag that says whether there is a variable with the corresponding
name, a pointer to the object contained in that variable, and other
information. Let's call these structures `string nodes,' for lack of
a better term. The central data structure in a running SNOBOL
program, then, is a hash table whose elements are string nodes. The
two-line program above creates three string nodes: One for X, one for
Y, and one for FOO. X and Y contain pointers to FOO.


The fact that X and Y end up pointing to the same place never causes a
problem because strings are not mutable in SNOBOL.


One advantage of this scheme is that the statement


$Y = 'BAR'


which assigns the string `BAR' to the variable whose name is held in
the variable Y (in this case is is equivalent to (FOO = 'BAR'),) is
very efficient. Similarly, computed GOTO, which is SNOBOL's only
interesting control flow primitive, is also efficient. I imagine that
this structure also has positive implications for SNOBOL's garbage
collection scheme, but I don't know wnough about it to say for sure.


Now, because the string `FOO' is only ever stored once, it is
representable inside of SNOBOL by a unique pointer that points to its
string node. If you are given a string s, which is actually
represented by a pointer to a string node, you can check to see if s
contains the string `FOO' by doing a single pointer comparison. If s
and FOO point to the same place, s contains FOO, otherwise not. This
is very fast.


The way an associative array is implemented in SNOBOL is as an array
of these:


pointer to FOO's string node pointer to value assoicated with FOO
pointer to KEY2's string ndoe pointer to value assoicated with KEY2
... ...
pointer to BAR's string node pointer to value assoicated with BAR


Suppose you want the value associated with BAR. SNOBOL has already
compiled the string BAR, installed a string node for it in the global
hash table if it wasn't already there, and refers to BAR as a pointer
to this string node. It then does *linear search* on the left column
of the table; however, it does *not* have to compare each key with the
string `BAR' to check for a match; all it has to do is compare
pointers. If it finds a pointer that is equal to the pointer to BAR's
string node, that's BAR, and it returns the right-hand entry of that
table row. The search is linear, but each comparison takes only a
couple of machine instructions, because it's just a pointer equality
test.


One advantage of this method is that conversion from an associative
array to an n-by-2 regular array is trivial, because the table is
already in that format. SNOBOL4 offers this operation as a language
primitive. Another advantage: The pointers in the left-hand column
needn't be pointers to strings; they can be pointers to arrays,
user-defined data structures, or whatever.


There are no SNOBOL groups; I hope nobody minds my large crossposting.
I've directed followups to comp.lang.misc. If you don't want to post
there, please edit the Newsgroups: line thoughtfully.
--


[The idea of storing all the strings as unique pointers comes from
Lisp's atomic symbols, but I hadn't realized that Snobol associative
arrays were just arrays of pairs. Snobol4 was one of the most lopsided
languages I ever used. It had a fabulous backtracking pattern matching
system which was powerful enough to write a single pattern that could
syntax check a Fortran program. But all it would tell you was yes or no --
the flow of control was otherwise very primitive. The followon language
Icon, mentioned a few messages ago, fixed that problem but good. -John]


--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.