binary search debugging of compilers

Russ Cox <>
Fri, 12 May 2023 13:59:22 -0400

          From comp.compilers

Related articles
binary search debugging of compilers (Russ Cox) (2023-05-12)
Re: binary search debugging of compilers (Kaz Kylheku) (2023-05-13)
Re: binary search debugging of compilers (Fernando) (2023-05-13)
Re: binary search debugging of compilers (Kaz Kylheku) (2023-05-14)
Re: binary search debugging of compilers (Thomas Koenig) (2023-05-14)
Re: binary search debugging of compilers (gah4) (2023-05-14)
Re: binary search debugging of compilers (Cameron McInally) (2023-05-14)
[18 later articles]
| List of all articles for this month |

From: Russ Cox <>
Newsgroups: comp.compilers
Date: Fri, 12 May 2023 13:59:22 -0400
Organization: Compilers Central
Injection-Info:; posting-host=""; logging-data="21438"; mail-complaints-to=""
Keywords: tools, debug
Posted-Date: 12 May 2023 21:34:34 EDT

Hi all,

In the Go compiler, we have been using a technique for the past eight
years that I assume has been done in other compilers before, but I'm
unaware of any. I describe it below, along with refinements we've made
over the years. I'd be very interested to hear about earlier uses of
approaches like this, or of any other ideas for use cases.

In 2015, Keith Randall was working on a new SSA backend for the Go
compiler. The old and new backends coexisted, and we could use either
for any given function in the compilation. Keith introduced an
environment variable GOSSAHASH that specifies the last few binary
digits of the hash of function names that should use the new backend.
So GOSSAHASH=0110 means compile only those functions whose names hash
to a value with last four bits 0110. When a test is failing with the
new backend, you try GOSSAHASH=0 and GOSSAHASH=1, and then use binary
search to progressively refine the pattern, narrowing the failure down
until only a single function is being compiled with the new backend.
This was invaluable for approaching failures in large real-world tests
(tests for libraries or production code, not for the compiler) that we
had not written and did not understand.

In 2016, David Chase was debugging a new optimizer rewrite rule that
should have been correct but was causing mysterious test failures.
He reused the same technique at finer granularity: the bit pattern now
matched against a hash of the current file name and line number,
so that the binary search pinpoints the exact line of code that is
miscompiled (meaning causes a test failure) when the suspect rewrite
rule is used.

Since then we have continued to find uses for this approach.
For example, when we added automatic FMA insertion on certain
architectures, some tests started failing. By making the rewrite rule
controlled by a file:line hash pattern, we can pinpoint the exact line
or lines where FMA insertion causes the failure.

As another example, we are considering a small change to the semantics
of variables declared in for loops, along the lines of what C# 5 did
and what JavaScript does in for-let loops. By using a hash pattern to
toggle whether the new semantics applies at a given file:line, we can
identify the specific loops where the semantic change provokes a
test failure.

The binary search does not have to stop at a single failure. With a
richer pattern syntax adding an AND-NOT operator, we can disable the
instances we've found, and if the test still fails, run a new binary
search to find another culprit. (Technically, AND-NOT is not necessary
for doing a complete binary traversal of the search space guided by
single failures, but it is easy, and it is necessary for the next step.)

The binary search can also be adapted to finding pairs or larger
groups of changes, all of which are necessary to provoke a failure.
If suffix B provokes the failure but neither 0B nor 1B do, then
(assuming the test is not flaky!) you start using the pattern “0B OR 1B”,
refine the left half of the expression with binary search
(for example the first step checks “00B OR 1B” versus “10B OR 1B”),
and then having refined the left half, move on to refining the right half.
After refining both, you have an OR of patterns that each match one
change, and all the changes are necessary to provoke the failure.
The splitting naturally recurses as needed to find a locally minimal set
of changes, in the sense that by construction, removing any single
change would not provoke the failure.

Stepping slightly away from compilers temporarily, we have recently
realized that this technique can be adapted to identifying specific
bug-inducing contexts for library functions as well. For example,
suppose we want to introduce a new qsort implementation, but that
causes tests to fail in a large program (maybe even a compiler) that
makes use of qsort in many contexts. Binary search selecting the old
or new algorithm according to a hash of the call stack will quickly
identify the exact call stack affected by the new qsort.

As I mentioned at the top, I am interested to hear about earlier uses
of approaches like this, or any other ideas for situations where the
approach might be applicable. Clearly the general problem has overlaps
with group testing [1], and in particular Hwang's binary search-based
approach (1972) [2]. I've also found one paper describing this
algorithm in the context of identifying which of a large set of
changes between GDB versions caused a crash (1999) [3].
(That paper's version of the algorithm for finding pairs or larger
groups is buggy unless there is only one such group.)

It seems like the idea of binary search to narrow down a buggy
software change is so natural that there must have been many earlier
uses for debugging before 1999, especially in compilers.

Thanks for any references, interesting history, or ideas.

If anyone is interested in code, we've published the binary search tool [4]
and the code that interprets the patterns [5].


[1] (a remarkably good article)

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.