|From:||Hans-Peter Diettrich <DrDiettrich1@netscape.net>|
|Date:||Fri, 6 Oct 2017 02:56:06 +0200|
|Injection-Info:||gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="29200"; mail-complaints-to="firstname.lastname@example.org"|
|Posted-Date:||06 Oct 2017 11:26:52 EDT|
Am 05.10.2017 um 18:29 schrieb Robert Jacobson:
> I am trying to wrap my mind around the issue of dynamic dispatch in the
> context of switching on opcode in a bytecode interpreter
> Switching on opcode number is a special case of a general switch.
> 1. opcodes are dense and contiguous;
This is not a requirement, can be optimized (see below).
> 2. opcodes typically take values over a relatively small range.
> For example, CPython's opcodes range from 1 to 161 (excepting EXCEPT_HANDLER).
> The jump table is thus a perfect hash: We need only store the addresses of the
> opcode handler functions in a smallish table and use the opcode value as an
> index into the table. Given (1) and (2), it's not clear to me that, say, a
> binary search with Log_2(161) ~ 8 comparisons is better than the lookup table.
> Is that really what the recent literature is saying? Are concerns about cache
> misses really valid for a table of 161 addresses? Is a load instruction that
> much more expensive than comparison instructions with branch prediction?
I second your doubts, as soon as more values have to be considered than
used in the examples.
I don't see how branch prediction will have an impact on runtime, when
the prediction only applies to conditional branches, and the next opcode
is one-of-many. If the cache can hold the jump table, nothing can be
improved with this model. Consider how many *possible* comparisons and
branches must be encoded, so that they will lead to *all* possible (161)
Many years ago I saw an 68k emulator, where the opcode was multiplied by
16 and used as an offset into the emulator code, eliminating the branch
table. The 16 byte blocks cover the emulator code for most instructions,
this factor can be scaled to any sufficiently large block size if
required. If some bytecodes require more than one block of executable
code, the codes can be made discontiguous, with gaps wherever the
code-to-execute is longer than the chosen blocking factor.
Of course such an approach lacks any cache support, but a large cache
still may hold the most frequently used code sequences automatically. If
the cache is too small to hold the code for multiple opcodes, it's
useless at all.
Return to the
Search the comp.compilers archives again.