Re: Avoiding Branch Misprediction in Virtual Machine (VM) Portably

anton@mips.complang.tuwien.ac.at (Anton Ertl)
Sun, 13 May 2007 08:12:07 GMT

          From comp.compilers

Related articles
Avoiding Branch Misprediction in Virtual Machine (VM) Portably moron451-compiler1@yahoo.com (2007-05-12)
Re: Avoiding Branch Misprediction in Virtual Machine (VM) Portably anton@mips.complang.tuwien.ac.at (2007-05-13)
Re: Avoiding Branch Misprediction in Virtual Machine (VM) Portably moron451-compiler1@yahoo.com (2007-05-13)
Re: Avoiding Branch Misprediction in Virtual Machine (VM) Portably eliotm@pacbell.net (Eliot Miranda) (2007-05-14)
| List of all articles for this month |

From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.compilers
Date: Sun, 13 May 2007 08:12:07 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
References: 07-05-044
Keywords: VM, design
Posted-Date: 13 May 2007 15:27:15 EDT



moron451-compiler1@yahoo.com writes:
>Hello,
>
>I've read several web sites and papers about coding virtual machines
>using a threaded code technique (Anton Ertl's papers for example). He
>recommends the use of GNU C's labels-as-values as a somewhat portable
>way to do threaded coding. However, I don't want to rely on a non-
>portable method like that. I want my VM to be 100% ANSI C.


The classical approach would be to use macros and define them
appropriately for switch dispatch (for ANSI C) or for threaded code.


> That seems to only leave me with the "Giant Switch" method.
>However... I recently thought about a different method that still
>uses the switch statement, but instead of one giant switch, it uses
>one for each instruction (see below). I was wondering if anyone can
>tell me if this is a good idea or not.


Looks like it is a good idea. I implemented a variant of my
threaded-code microbenchmark using your technique
<http://www.complang.tuwien.ac.at/forth/threading/repl-switch.c>, and
here are the results on a 2.2 GHz Athlon 64 X2 4400+ (times in
seconds):


  direct single replic.
threaded switch switch
    0.38 1.11 0.51 i386 gcc-2.95
    0.73 1.11 0.50 i386 gcc-3.3 -fno-crossjumping
    0.75 1.10 0.54 AMD64 gcc-3.3 -fno-crossjumping
    0.73 1.10 0.55 AMD64 gcc-4.1


As you can see, thanks to performance regressions in recent gccs your
replicated switch technique even beats direct threaded code on these
compilers, although it is still quite a bit slower than well-compiled
(i.e., gcc-2.95) direct threaded code. I had the impression that the
performance regressions were mostly fixed in gcc-4.x, but apparently
that is not generally the case.


The disadvantage is that you have n^2 jumps, which make compilation
slower, and lower the number of VM instructions you can use (although
probably not dramatically). Also, you have n replicas of the jump
tables, which may increase the data cache miss rate.


This is ironical: gcc's data flow analysis modeled an indirect goto
just as n jumps to all n possible labels (so with n labels and n gotos
it modeled n^2 jumps). They then changed the implementation of
indirect gotos to have one shared indirect jump, and every indirect
goto would perform a jump to that shared indirect jump; this defeats
the branch prediction by the BTB and causes the performance regression
seen above. Now you introduce another technique that makes all these
n^2 control flow edges explicit, and this might become the method of
choice for implementing virtual machines. So we are back where we
started, only in a slightly worse place: compilation is just as slow
(probably a little slower) as before, and execution is a somewhat
slower than before. Well, maybe the gcc guys will work hard to undo
the replication and make the replicated switch technique just as slow
as the ordinary switch dispatch.


>I'm a bit worried that I haven't
>found anything advocating this in "the literature" however, so maybe I
>am missing an important problem with it. Any comments?


AFAIK you are the first to invent this technique. Whom should I
attribute it to? moron451-compiler1@yahoo.com does not sound so
great:-)


Our esteemed moderator writes:
>[Ignoring the detail that there's no "label" keyword in C, I don't see
>the point. A reasonable compiler will recognize collapse the common code
>sequences and jumps to jumps so you'll end up with the same object code
>you would have from the giant switch.


As the numbers above show, this is not the case for various gcc
versions, and hopefully that will stay that way. The capabilities of
compilers are frequently overestimated.


- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.