|sequential binary parallelization at run-time email@example.com (Lauren) (2007-10-25)|
|Re: sequential binary parallelization at run-time firstname.lastname@example.org (George Neuner) (2007-10-27)|
|Unpopular parameter passing mechanisms and parallelisation (was Re: se email@example.com (Anton Lokhmotov) (2007-10-28)|
|Re: Unpopular parameter passing mechanisms and parallelisation firstname.lastname@example.org (2007-10-29)|
|Re: Unpopular parameter passing mechanisms and parallelisation email@example.com (2007-10-29)|
|Re: Unpopular parameter passing mechanisms and parallelisation (was Re firstname.lastname@example.org (George Neuner) (2007-10-30)|
|Re: Unpopular parameter passing mechanisms and parallelisation (was Re email@example.com (glen herrmannsfeldt) (2007-10-30)|
|Re: Unpopular parameter passing mechanisms and parallelisation firstname.lastname@example.org (Anton Lokhmotov) (2007-10-31)|
|[2 later articles]|
|From:||George Neuner <email@example.com>|
|Date:||Sat, 27 Oct 2007 02:11:47 -0400|
|Posted-Date:||27 Oct 2007 02:14:28 EDT|
On Thu, 25 Oct 2007 11:38:38 -0000, Lauren <firstname.lastname@example.org>
> I find the dynamic binary optimization being very popular with the
>virtual machine techniques. Most of the papers I have found are
>applying conventional compilation optimizations to the sequential
>binary to get some speedup. But in the multicore or manycore era, how
>shall we exploit various granularity parallelism at run-time? How
>extracting parallelism from the serial binary? I can't find papers
>about paralleling sequential binary at run-time.
> Thanks very much!
>[If anyone has the answer to this question, I know a whole lot of people
>who'd like to talk to you. -John]
No real answer, but a few comments.
JIT compilers can do simple things to sequential code like unrolling
loops and so forth, but adaptively exploiting parallel hardware at
runtime (striping and migrating computations, etc.) requires
restructuring the code to expose parallelism and make communication
and synchronization explicit. It currently isn't possible to do this
at runtime as it requires quite a bit of complex analysis.
I'm not aware of any good papers on the subject, but a place to start
would be to look at the implementation of languages like Caml and
pHaskell and at adaptive parallel runtimes like the one used by Cilk
(but ignore the Cilk language itself). I don't think any of these are
the answer, but they have some good ideas.
I been thinking about adaptive parallelization for a while and my own
opinion, FWIW, is that the right way to accomplish it on close-coupled
shared memory systems is to aggressively compile call-by-need, make
each call-by-need function an independently schedulable entity (ie. a
microthread), and structure the code as a state machine graph of
microthreads ordered by data dependencies. The parallel runtime would
operate in a similar fashion to an NFA, advancing to new states as
dependencies are satisfied and scheduling runnable microthreads across
I'm slowly working on a proof of concept (and no, I haven't written
anything about it). It will be a while yet before I know if I'm on to
something or just fooling myself. However, I'm not aware of any other
effort currently going in the same direction. Few languages currently
implement call-by-need and those that do try to avoid it as much as
possible because aggressively creating functions is inefficient for a
single CPU. But I don't think that matters much - in a few years most
CPUs will be multicores and adaptive scalability will become more
important than single core speed. I'm betting most people are willing
to take a hit on a single core if their programs will automatically
scale fairly well to multiple cores.
Return to the
Search the comp.compilers archives again.