Re: Is This a Dumb Idea? paralellizing byte codes

Alain Ketterlin <alain@universite-de-strasbourg.fr>
Sat, 22 Oct 2022 23:50:49 +0200

          From comp.compilers

Related articles
Is This a Dumb Idea? paralellizing byte codes nobozo@gmail.com (Jon Forrest) (2022-10-22)
Re: Is This a Dumb Idea? paralellizing byte codes alain@universite-de-strasbourg.fr (Alain Ketterlin) (2022-10-22)
Re: Is This a Dumb Idea? paralellizing byte codes DrDiettrich1@netscape.net (Hans-Peter Diettrich) (2022-10-23)
Re: Is This a Dumb Idea? paralellizing byte codes gah4@u.washington.edu (gah4) (2022-10-22)
Re: Is This a Dumb Idea? paralellizing byte codes anton@mips.complang.tuwien.ac.at (2022-10-23)
Re: Is This a Dumb Idea? paralellizing byte codes anton@mips.complang.tuwien.ac.at (2022-10-23)
Re: Is This a Dumb Idea? paralellizing byte codes alain@universite-de-strasbourg.fr (Alain Ketterlin) (2022-10-23)
Re: Is This a Dumb Idea? paralellizing byte codes gah4@u.washington.edu (gah4) (2022-10-26)
[3 later articles]
| List of all articles for this month |

From: Alain Ketterlin <alain@universite-de-strasbourg.fr>
Newsgroups: comp.compilers
Date: Sat, 22 Oct 2022 23:50:49 +0200
Organization: =?utf-8?Q?Universit=C3=A9?= de Strasbourg
References: 22-10-046
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="87474"; mail-complaints-to="abuse@iecc.com"
Keywords: optimize, parallel, interpreter
Posted-Date: 22 Oct 2022 22:51:19 EDT

Jon Forrest <nobozo@gmail.com> writes:


> Modern CPUs employ all kinds of clever techniques to improve
> instruction level parallelism (ILP). I was wondering if it
> makes sense to try to employ similar techniques in the
> virtual machines used to execute byte code produced by language
> compilers.


I think it does not, mostly because such fine grain parallelism cannot
be implemented efficiently enough. Even waking up some sort of worker
threads, along with the necessary synchronization, would probably cost
more than the gain achieved by executing a handful of byte-codes in
parallel.


I think there is no way to efficiently implement fine-grain, ILP-like,
parallelism in software (except for vectorization, but that's a
completely different topic).


> By that I mean what if virtual machines were to examine byte code
> streams to detect when it would be safe to execute multiple
> byte codes concurrently? Then, based on its findings, the virtual
> machine would execute as many byte codes concurrently as is safe.


This implies that some static analysis can be performed on the
byte-codes. It may be possible in some cases (the JVM comes to mind),
and nearly impossible in others (essentially dynamic languages, of which
Python is the epitome).


> I have no idea if the overhead of the byte code examination would
> exceed any advantage of the concurrent execution, although it's
> important to point out that this examination would only have to
> be done once, and the results could somehow be stored along with
> the byte code. Of course, if the byte code changes the examination
> would have to be done again.


You're right about static, "compile-time" analysis, whenever possible.
Dynamic analysis of byte-code streams, plus run-time fine-grain
parallelization, is probably a lost battle in terms of efficiency.


> I'm also worried that internal virtual machine locking requirements
> might make this idea infeasible. For example, in a virtual machine with
> a global interpreter lock, would it be possible for there to be any
> concurrent execution?


That's only part of the problem. Note also that not all virtual machines
have a "global interpreter lock".


> This idea, if it works, would be a great way to take advantage of
> multiple cores without having to rewrite any user code. The big
> question is whether it would work.


Why not just let super-scalar processors take car of that? Modern
processors can handle tens of instructions at a time, do all the dirty
work (handling dependencies, essentially), and they'll also do all kind
of crazy stuff you probably won't even think of implementing in software
(like brand prediction, data prefetches, and more). You'll probably get
much more gain by working on making the virtual machine loop efficient
enough to leverage the power of the hardware.


I've heard/read several times that byte-code micro-optimizations are not
worth the trouble. Here is a paper from 2015 on a related subject
("Branch prediction and the performance of interpreters -- Don't trust
folklore"):


https://ieeexplore.ieee.org/document/7054191


(you may find the corresponding research report if you can't access the
full text from that site). It shows how far processors have gone in what
was once left to the program designer.


-- Alain.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.