Re: Fixed stack / dynamic stack for each thread
Mon, 05 Jan 2009 12:34:38 -0600

          From comp.compilers

Related articles
Re: Fixed stack / dynamic stack for each thread (Ira Baxter) (2008-12-23)
Re: Fixed stack / dynamic stack for each thread (2009-01-05)
Re: Fixed stack / dynamic stack for each thread (Ira Baxter) (2009-01-10)
Re: Fixed stack / dynamic stack for each thread (2009-01-25)
| List of all articles for this month |

Newsgroups: comp.compilers
Date: Mon, 05 Jan 2009 12:34:38 -0600
Organization: Compilers Central
References: 08-12-105
Keywords: parallel, design
Posted-Date: 09 Jan 2009 07:32:16 EST (Ira Baxter) wrote:

> > It's quite possible that people theorising about dynamic stacks, and
> > maybe even experimenting with them, have solved all these problems.
> > But I'd want to see some evidence for it.
> Our PARLANSE parallell programming language

This is interesting; having a language that has this concept thoroughly
built into it at the code generation level is clearly the best way to do
it. Trying to retrofit it into an existing language, especially one like
C/C++ that has pointers, is going to be very hard.

I hadn't thought of going this far with dynamic stacks, but this
stack-per-call concept is clearly more viable than only switching
stacks occasionally.

> is based on the philosophy that there are an unbounded number of small
> computational grains.

Makes sense. The stuff I work on won't readily break down into that
kind of form, sadly.

> It operates by using allocation activation records on a per-function
> call basis, where the activation record size is determined by the
> needs just of the function. Each function call thus does a stack
> switch.

Makes sense. I presume that this language does not allow functions to
be passed as parameters? I mean this in the sense of the function
itself, where C would pass a function pointer, rather than the result
of the function. If that is allowed, how is a function passed?

> Argument handling occurs by passing arguments in the registers, or
> passing a pointer to a block of parameters in the caller's stack.

Always call-by-value? No way to pass addresses? Much safer that way
for parallelism, of course.

> Uplevel addressing to lexically scoped, addressable values in
> surrounding function definitions is handled by constructing a
> "display", an old Algol idea which uses a small array per function,
> indexed by scope level number, containing a pointer to each parent
> activation record.

So it's a display of activation records, rather than of variables within
hem? I'm used to displays in work's custom language.

> We construct a "sparse" display, containing pointers to just the
> parent activation records that might be accessed. Typical displays
> are only a few entries long; in practice, nobody lexically nests
> dozens of functions deep. Seems our brains just can't handle this :-}

Absolutely. I find working without lexical scope entirely acceptable,
but tastes vary.

> The recieving function copies the part of the scope array it (or its
> subfunctions) may use.
> The overhead for all this is cheaper than you'd expect. it only takes
> a few machine instructions per function call, and this matters only if
> you are computing fibonacci or something else equally rediculously
> simple.

Indeed, the Windows thread activation overhead is going to be much
bigger than this. That's handling your stacks for you, naturally enough.
What stack sizes do you set for the Windows threads? Smaller than the
default 1MB, I assume?

> In practice, function activation records can be large, due to
> inlining other (simple) functions, or due to paralllelism primitives,
> which for PARLANSE statically allocate grain-tracking data structures
> in the activation record for common static-parallelism primitives,
> such as (parallel/concurrent a b c.. z) for fixed numbers of grains a
> b ..., for partial order parallelism (always has a fixed number of
> grains), etc. The overhead seems to be maybe 5%; this is well worth
> it because we can usefully go parallel with reasonably large numbers
> of processors (threads).

How many real processor cores have you taken this up to? it seems to me
that simple memory contention would start to bite you at the 10-ish kind
of level.

> In production, our code doesn't trap

Can you absolutely prevent this? You can go a long way towards this by
having no floating point trapping and limited dynamic memory allocation
(and thus no pointer arithmetic for programmers to screw up), but what
happens when a Windows API call messes up? Debugging would seem hard.

> What's more remarkable is that PARLANSE propagates exceptions across
> not only function boundaries, but cleanly across parallelism
> boundaries. That requires a lot of complicated machinery

Yup. I've done trap propagation out of Windows worked thread back to the
thread that created them. Not a lot of code, but a lot of care needed.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.