Opcode handler dispatch in an interpreter: Implementing switch on OpCode.

Robert Jacobson <rljacobson@gmail.com>
Thu, 5 Oct 2017 09:29:45 -0700 (PDT)

          From comp.compilers

Related articles
| List of all articles for this month |

From: Robert Jacobson <rljacobson@gmail.com>
Newsgroups: comp.compilers
Date: Thu, 5 Oct 2017 09:29:45 -0700 (PDT)
Organization: Compilers Central
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="91751"; mail-complaints-to="abuse@iecc.com"
Keywords: architecture, design
Posted-Date: 05 Oct 2017 15:05:14 EDT

I am trying to wrap my mind around the issue of dynamic dispatch in the
context of switching on opcode in a bytecode interpreter. I came across Julian
Squires' recent well-cited blog post (via r/programming), "Are Jump Tables
Always Fastest?"
(http://www.cipht.net/2017/10/03/are-jump-tables-always-fastest.html). There
appears to be a sizable literature on the subject. Optimal designs apparently
require careful consideration of the processor's branch prediction capability.
I have some questions.


Switching on opcode number is a special case of a general switch.
Specifically:
1. opcodes are dense and contiguous;
2. opcodes typically take values over a relatively small range.
For example, CPython's opcodes range from 1 to 161 (excepting EXCEPT_HANDLER).
The jump table is thus a perfect hash: We need only store the addresses of the
opcode handler functions in a smallish table and use the opcode value as an
index into the table. Given (1) and (2), it's not clear to me that, say, a
binary search with Log_2(161) ~ 8 comparisons is better than the lookup table.
Is that really what the recent literature is saying? Are concerns about cache
misses really valid for a table of 161 addresses? Is a load instruction that
much more expensive than comparison instructions with branch prediction?


I also have a question about optimization. Testing for opcode values in order
of frequency of occurrence optimizes for the 1-gram entropy of opcodes. While
we expect the conditional entropy of a given opcode to be high, I would still
think that a bigram optimization would still be a benefit. One wouldn't have
to compute all of the conditional probability distributions for every opcode
to do this. Just do it for the statistically most common bigrams, including
checks of the next opcode against the most probable successor in the current
opcode's handler code. Is this ever done in practice? Why or why not?


Best,


Robert


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.