hash code binary search at work

Kaz Kylheku <864-117-4973@kylheku.com>
Mon, 17 Jul 2023 17:35:31 -0000

          From comp.compilers

Related articles
hash code binary search at work 864-117-4973@kylheku.com (Kaz Kylheku) (2023-07-17)
| List of all articles for this month |

From: Kaz Kylheku <864-117-4973@kylheku.com>
Newsgroups: comp.compilers
Date: Mon, 17 Jul 2023 17:35:31 -0000
Organization: Compilers Central
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="4274"; mail-complaints-to="abuse@iecc.com"
Keywords: debug, question
Posted-Date: 17 Jul 2023 14:53:45 EDT

I'm hunting an issue using the binary search algorithm described here.
The issue shows up as a unit test failure and is introduced by a single
added line:

static val rt_defun(val name, val function)
    sethash(top_fb, name, cons(name, function));
    uw_purge_deferred_warning(cons(fun_s, name));
    uw_purge_deferred_warning(cons(sym_s, name));
    vm_invalidate_binding(name); /* <----ADDED LINE */

    return name;

This vm_invalidate_binding has to do with caching of symbolic references
in compiled code. Compiled code caches the bindings to global functions
and variables for faster access (not having to go through the symbolic
lookup). But in order for redefinitions to work, a cached binding has to
be invalidated. The call missing in rt_defun means that defun is
neglecting to do this.

But, when that line is added, the pattern matching module
stdlib/match.tl is miscompiled. A certain function in it looks
substantially different.

So for what values of name does vm_invalidate_binding(name)
reproduce this problem?

I narrowed it to eight bits of hash (more than enough), and then stuck
in a print statement to print all symbols for which we call the

static val rt_defun(val name, val function)
    sethash(top_fb, name, cons(name, function));
    uw_purge_deferred_warning(cons(fun_s, name));
    uw_purge_deferred_warning(cons(sym_s, name));

    if ((c_u(hash_equal(symbol_name(name), zero)) & 0x7F) == 0x6E) // 1111_1111 0110_0110
        format(t, lit("potentially bad sym: ~s\n"), name, nao);

    return name;

Output, when stdlib/match.tl is recompiled:

0:[0717:072659]:sun-go:~/txr$ make
TXR stdlib/match.tl -> stdlib/match.tlo
potentially bad sym: non-triv-pat-p
potentially bad sym: non-triv-pat-p
potentially bad sym: non-triv-pat-p
potentially bad sym: non-triv-pat-p

non-triv-pat-p happens to be a function that is defined twice, in direct

(defun non-triv-pat-p (syntax)
    (ignore syntax)

(defun non-triv-pat-p (syntax)
    (match-case syntax
        ((@(eq 'sys:expr) (@(bindable) . @nil)) t)
        ((@(eq 'sys:var) @(or @(bindable) nil) . @nil) t)
        ((@(eq 'sys:quasi) . @(some @(consp))) t)
        ((@(eq 'sys:qquote) @nil) t)
        ((@pat . @rest) (or (non-triv-pat-p pat)
                                                (non-triv-pat-p rest)))
        (#R(@from @to) (or (non-triv-pat-p from)
                                              (non-triv-pat-p to)))
        (@(some @(non-triv-pat-p)) t)))

This is necessary because it's a very fundamental function in pattern
matching, and because it's defined using pattern matching.
In order to expand the match-case form, we must have a definition
of non-triv-pat-p. This is the first such instance in the file;
no code before these two definitions requires pattern matching.

This boostrapping style is possible because pattern matching is
expanded. We need pattern matching to work sufficiently well that we can
expand the match-case in non-triv-pat-p, and for that, we need
an initial definition of non-triv-pat-p that is "correct enough"
for the purpose.

I.e. non-triv-pat-p is carefully written such that the expansion of its
own pattern matching code is not affected by the immediately preceding
low-fidelity definition of non-triv-pat-p which just returns true: all
patterns are nontrivial. That proposition is in fact true: all the
patterns in that match-case function are either nontrivial or else, like
in the case of the nil in the second case, are recognized as trivial
without non-triv-pat-p having to be consulted.

The function that gets miscompiled is transform-qquote, later in
the file. There is no difference anywhere else.

I'm suspecting that the second definition of non-triv-pat-p is always
being mistranslated due to some other issue elsewhere, but the
bug is hidden due to code latching on to an outdated version that
works (well enough so that transform-qquote isn't mis-expanded).

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.