|benchmark programs email@example.com (1996-07-03)|
|Re: benchmark programs firstname.lastname@example.org (1996-07-05)|
|Re: benchmark programs email@example.com (1996-07-07)|
|Re: benchmark programs firstname.lastname@example.org (1996-07-09)|
|From:||email@example.com (Jim McDonald)|
|Date:||7 Jul 1996 14:23:30 -0400|
|Organization:||Kestrel Institute, Palo Alto, CA|
Peach Microsystems (firstname.lastname@example.org) wrote:
: email@example.com (Graham Jones) wrote:
: >I am doing some simulations into latency hiding techniques and I would like
: >to find some benchmarks which either
: > 1) prove difficult for caches
: Gabriel's "Performance & Evaluation of LISP Systems", early 80's, had LISP
: source for a number of benchmarks including multiple variants of a viciously
: recursive but otherwise trivial function called tak. At least one variant was
: deliberately designed to defeat cache.
: // omitting details I don't recall offhand
: tak(unsigned x,y,z)
: return z ;
: else // recurse, calling tak 4 times
: return( tak(
: tak( ), // arguments here are permutations
: tak( ), // of x,y,z
: tak( ) // with some x-1's mixed in
: ) ;
: The cache-defeating variant had some large number of copies of this function,
: like tak000 to tak999, and made the recursion call different copies in a
: random order.
Benchmarking cache effects can be tricky.
I remember adding an optimization to Lucid's compiler that would
notice if the code portion of a function replicated that of another
function, in which case it would share the old code object rather than
build a new one. The motivation was to exploit the fact that when
recompiling a file you usually have pre-existing code objects in
memory for most or all of the functions.
Having done that I was a bit embarrassed to discover that this trick
caused all (100?) variants of the cache-busting version of TAK to
share one code vector, hence avoiding the cache problem. The effect
was entirely accidental, but it looked as if I was doing some trick
specific to win for that benchmark.
Then there's the PUZZLE benchmark that ran 30% slower when the
compiler got smarter and removed a few dozen lines of redundant code.
The redundant code was rarely executed, and its mere presence in
memory had happened to keep two interleaved and heavily run sections
of code in distinct cache lines. Removing it caused fewer
instructions to be executed, but purely by accident the cache
contention leaped upward.
Return to the
Search the comp.compilers archives again.