From: | BGB <cr88192@hotmail.com> |
Newsgroups: | comp.compilers |
Date: | Wed, 29 Jun 2011 15:51:48 -0700 |
Organization: | albasani.net |
References: | 11-06-037 11-06-042 |
Keywords: | storage, symbols |
Posted-Date: | 01 Jul 2011 09:46:45 EDT |
On 6/23/2011 9:43 AM, noitalmost wrote:
> On Monday, June 20, 2011 03:43:24 pm John wrote:
>>> Is this a reasonable way to approach the problem?
>> [Pretty much. The standard way to handle references to an enclosing
>> scope is with a display, the calling procedure passes the enclosing
>> stack frame addresses as hidden parameters. If your language doesn't
>> allow variable sized declarations or recursion, P.x will be at a fixed
>> distance below the current stack frame so you can address it directly,
>> but in a more general case, you need the display. See any 1970s
>> compiler text for details. -John]
>
> My language does allow recursion, so my approach won't work as
> written. So the AST node for an Ident should have a field for
> enclosing scope levels, not stack levels?
so, you are not using bytecode or native code or similar?...
IME, normally things like resolving identifiers is often handled when
producing bytecode, rather than at the level of the ASTs (which often
only hold a reference to the identifier by name, and so perform lookups
by name).
to try to answer:
yes, normally one links by scope-frame, and not by stack frame.
the reason is that with traditional (lexical) scoping, the bindings may
only be contained in the parent scope, but one may receive calls from
places other than the enclosing function (say, if the inner function is
passed to another function as a callback, or if it is returned or stored
in an object).
in the above case, linking by stack frame will not work (and generally
requires actually placing bindings off in the heap or similar).
in my case, in the bytecode (for one of my VMs), I actually just use a
single number to refer both to variables in the current frame and to
parent frames, but this is partly because... errm... I currently
implement my environment frames as linked-lists. since new bindings are
always linked to the bottom of the list, local bindings tend to have the
lowest numbers, followed by arguments, and followed by bindings in
enclosing scopes.
in non-closure cases, an optimization is to use a "binding stack" in
place of the linked list (special logic is used to determine if/when
functions will close-over any variables, allowing optimization of
non-closure cases). the binding stack is a little cheaper, and can be
coerced into a fixed-form glob of memory by a JIT.
granted, also common, and potentially better (efficiency-wise), is to
have linked frames as collections of bindings. then one accesses the
variables using a pair of numbers: how many frames to look "upwards" and
how many variables to look "rightwards". some of my other VMs (both past
and planned) have used this strategy.
a partial advantage here (of always using linked frames) is that one
doesn't need to make so many special cases between closures and
non-closures (besides whether or not to preserve/release the frame).
my C compiler internally did something along the lines of internally
using heap-allocated structs for captured bindings (non-captured
bindings were stored directly into the stack frames).
note: because of the way lexical scoping works, an interesting side
effect is that it doesn't really make much difference that some of my
different compilers/interpreters use different representations for the
binding frames (the lexical scope is, by practical definition,
mono-language, and in my VMs, typically local to the functions themselves).
although sadly my implementation of packages and imports creates
function-external lexical bindings, in turn creating more need for
closures to be created (as every package is itself a lexical binding
frame, and every "import" statement, a lexical binding...).
the same can't be said of control flow though, which sadly makes a bit
of a mess of exception handling. this is because control flow can easily
wind around here-and-there all over the place, and cross language
borders a good number of times.
this is also the present reason for the general lack of a good "call/cc"
mechanism as well, as one would need a call/cc which can also work with
plain-C call-frames (big nasty issue...).
> I looked at Aho's description of displays. Currently, my interpreter
> is using a more abstract stack. It's a stack of pointers, so the first
> declared variable in a scope goes at frame offset 0, the second at
> offset 1, etc. I was thinking of dedicating offset 0 to be a pointer
> to all the bookkeeping info, such as enclosing scope pointers. Will
> this work as my language matures, or are there some glaring gotchas?
> [You more or less need a static place to pick up a pointer to the
> stack frame for the current invocation of each lexically enclosing
> routine. I don't see any reason that pointer to the display in a fixed
> place wouldn't work. It's covered in detail in any compiler text
> written when people still worried about compiling Pascal. -John]
yes.
I figure a stack-like or (per-binding) linked list style representation
can probably be made to work (one of my major VMs does this), it is just
optimizing it is likely to be a little more hairy (for example, naively
using a linked-list in ones' machine-code output is, questionable...).
Return to the
comp.compilers page.
Search the
comp.compilers archives again.