Re: binary search debugging of compilers

Russ Cox <rsc@swtch.com>
Thu, 18 May 2023 15:28:56 -0400

          From comp.compilers

Related articles
[10 earlier articles]
Re: binary search debugging of compilers rsc@swtch.com (Russ Cox) (2023-05-17)
Re: binary search debugging of compilers cameron.mcinally@nyu.edu (Cameron McInally) (2023-05-17)
Re: binary search debugging of compilers 864-117-4973@kylheku.com (Kaz Kylheku) (2023-05-17)
Re: binary search debugging of compilers 864-117-4973@kylheku.com (Kaz Kylheku) (2023-05-17)
Re: binary search debugging of compilers gah4@u.washington.edu (gah4) (2023-05-17)
Re: binary search debugging of compilers spibou@gmail.com (Spiros Bousbouras) (2023-05-18)
Re: binary search debugging of compilers rsc@swtch.com (Russ Cox) (2023-05-18)
Re: binary search debugging of compilers 864-117-4973@kylheku.com (Kaz Kylheku) (2023-05-19)
Re: binary search debugging of compilers 864-117-4973@kylheku.com (Kaz Kylheku) (2023-05-19)
binary search debugging of compilers cclick0@gmail.com (Cliff Click) (2023-05-19)
binary search debugging of compilers tekk.nolagi@gmail.com (Max B) (2023-05-19)
Re: binary search debugging of compilers tkoenig@netcologne.de (Thomas Koenig) (2023-05-19)
Re: binary search debugging of compilers mrs@kithrup.com (2023-05-20)
[1 later articles]
| List of all articles for this month |

From: Russ Cox <rsc@swtch.com>
Newsgroups: comp.compilers
Date: Thu, 18 May 2023 15:28:56 -0400
Organization: Compilers Central
References: <CADSkJJVN7RGqhgVDZaz_K5be6uEDaMofMu5OF3RR4Y5-fDu00Q@mail.gmail.com>
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="2547"; mail-complaints-to="abuse@iecc.com"
Keywords: debug, tools
Posted-Date: 18 May 2023 17:02:08 EDT
In-Reply-To: <CADSkJJVN7RGqhgVDZaz_K5be6uEDaMofMu5OF3RR4Y5-fDu00Q@mail.gmail.com>

> Any partial ordering can be extended to a total ordering. Why can't you
> just do that instead of using 'the longest "straight" line' ?


A precondition for binary search is that you are searching a function
f over some domain [0, n) with the property that f(i) == true implies
f(j) == true for all j >= i. That is, once the condition "flips", it
stays flipped. Otherwise binary search is not a correct algorithm.


The assumption in tools like git bisect is that some commit introduces the
bug, and then the bug is detectable in all future commits as well,
establishing the precondition. The test f(i), testing at commit i, can be
viewed as asking whether the bad commit is in the range [0, i] (the full
history of that commit). With that definition, clearly f(i) implies f(j)
for all j >= i, because the commit history [0, j] is a superset of the
commit history [0, i].


If instead you squash unrelated commit lines together in a total order, the
precondition is no longer true: if i is from one line but j > i is from a
different unmerged line, then testing j does not include some history that
testing i does. The bug can disappear as you cross from one line to
another. Focusing on the longest straight line in the commit history is one
way to establish the necessary precondition.


Another possibility, which is what git bisect does, is to maintain a set of
commits not yet decided and then find some "good" midpoint along the
longest line of commits remaining. You may in general end up with multiple
disconnected lines in your set of commits not yet decided, and you have to
handle each one separately until you find the bug.


What git bisect does, then, is a more complex algorithm that uses
binary search over longest lines as a subroutine. Restricted to a
single line of commits the precondition holds and binary search works.
(I am simplifying a bit. The actual implementation is perhaps not this
principled, but this is the essence of what it is doing and why it
works.)


The binary search I described to start this thread is also not exactly
the same binary search algorithm, because the underlying input problem
is not a single-input function with domain [0, n). Instead the input
problem (in the simplest form) is a function taking a set of possible
changes [lo, hi) that reports whether the bug is in that set. We
recognize it as binary search because it's still the same code and
idea, and every binary search does maintain this progressively
narrower range, but normal binary searches don't tell f the range.


Best,
Russ


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.