Re: binary search debugging of compilers

Kaz Kylheku <864-117-4973@kylheku.com>
Fri, 19 May 2023 03:44:04 -0000 (UTC)

          From comp.compilers

Related articles
[12 earlier articles]
Re: binary search debugging of compilers 864-117-4973@kylheku.com (Kaz Kylheku) (2023-05-17)
Re: binary search debugging of compilers 864-117-4973@kylheku.com (Kaz Kylheku) (2023-05-17)
Re: binary search debugging of compilers gah4@u.washington.edu (gah4) (2023-05-17)
Re: binary search debugging of compilers spibou@gmail.com (Spiros Bousbouras) (2023-05-18)
Re: binary search debugging of compilers rsc@swtch.com (Russ Cox) (2023-05-18)
Re: binary search debugging of compilers 864-117-4973@kylheku.com (Kaz Kylheku) (2023-05-19)
Re: binary search debugging of compilers 864-117-4973@kylheku.com (Kaz Kylheku) (2023-05-19)
binary search debugging of compilers cclick0@gmail.com (Cliff Click) (2023-05-19)
binary search debugging of compilers tekk.nolagi@gmail.com (Max B) (2023-05-19)
Re: binary search debugging of compilers tkoenig@netcologne.de (Thomas Koenig) (2023-05-19)
Re: binary search debugging of compilers mrs@kithrup.com (2023-05-20)
Re: binary search debugging of compilers gah4@u.washington.edu (gah4) (2023-05-20)
| List of all articles for this month |
From: Kaz Kylheku <864-117-4973@kylheku.com>
Newsgroups: comp.compilers
Date: Fri, 19 May 2023 03:44:04 -0000 (UTC)
Organization: A noiseless patient Spider
References: <CADSkJJVN7RGqhgVDZaz_K5be6uEDaMofMu5OF3RR4Y5-fDu00Q@mail.gmail.com> 23-05-013 23-05-018
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="18481"; mail-complaints-to="abuse@iecc.com"
Keywords: debug, tools
Posted-Date: 19 May 2023 15:51:14 EDT

On 2023-05-18, Spiros Bousbouras <spibou@gmail.com> wrote:
> On Wed, 17 May 2023 10:55:27 -0400
> Russ Cox <rsc@swtch.com> wrote:
>> When BitKeeper came along, we added this:
>>
>> --longest Restrict the deltas to those on the longest line
>> between the two range endpoints. Unlike a range, the
>> lower bound is included in the output.
>>
>> because BitKeeper history is a lattice, not a straight line. So your
>> test points became
>>
>> bk changes -nd:REV: > REVS
>>
>> and binary search over those. That gets you the longest "straight" line
>> in the graph.
>
> Any partial ordering can be extended to a total ordering. Why can't you
> just do that instead of using 'the longest "straight" line' ?


I think, not without introducing tie breaker rules that don't
necessarily make any sense.


But, maybe that doens't matter as something else. Say we have:


                        D-------E
                      / \
    A---B---C I----J
                      \ /
                        F---G---H


A is before B and so on. Are F-G-H before D-E or after?


Suppose the bug was introduced by E. We would like to know that.


The longest path between the A-J bisection points goes through F-G-H,
ignoring D-E. So it may wrongly look like I is the culprit, if E is
never tested. I has the bug, H doesn't.


Say we pick a precedence and flatten everything, in these
two ways:


    A---B---C---D---E---F---G---H---I----J


or


    A---B---C---F---G---H---D---E---I----J


problem is, under the first order, the bug is bouncing into
and out of existence:


    A---B---C---D---E---F---G---H---I----J
                                    X X X




E has the bug, and so do I and J. But F-G-H do not. So the bisect
algorithm has a problem; it is predicated on the idea that the bug
exists in the upper endpoint, is absent in the lower and has appeared
once in between.


Now, if we actually cherry pick all those commits into that order
(solving whatever conflicts are required), then we can do a meaningful
bisect. Then the rebased F G H commits do have the bug:


    A---B---C---D---E---F'--G'--H'--I'---J
                                    X X X X X X


It's unthinkable to be bisecting some lineage whipped up on-the-fly
that doesn't actually exist in the repo.


The best thing is to just keep the actual history that way.


Or else, do that longest-path things when bisecting. Then
if a merge point is implicated as the bad commit, then go
into the branch. When I is implicated, we note that it
has two parents, E and H. Only the H parentage was tested.
C was good. Thus we bisect C to E.


A smarter bisect algorithm could do this.


To make this sort of topical for comp.compilers, we note that the
revision history is a graph which resembles the control flow
graph of a program (one with no backwards branches).


The graph can be divided into our favorite chunks: basic blocks.


Basic blocks are sequence of commits in which no commit has multiple
parents (other than the first one), or is the parent of multiple children
(other than the last one).


The bisect algorithm should divide the graph into basic blocks,
and recurse into then when necessary.


So in:




                        D-------E
                      / \
    A---B---C I----J
                      \ /
                        F---G---H




we have these basic blocks:


                      /
    A---B---C
                      \




    D---E
              \




                      /
    F---G---H


    I---J




First, the bisect could process the longest-path basic blocks: those
headed by A, F and I. The goal is to determine which basic block
introduces the bug.


When a buggy basic block is identified, then bisect actually
does commit-granularity bisection in that block.


If we discover that the bad commit is the head of a block,
and that block has multiple parents, we then recurse over
those parents somehow.


We identify paths (through the basic block graph) we have not taken
and similarly analyze them.


--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.