Re: Paper: dividing by seven

anton@mips.complang.tuwien.ac.at
Mon, 13 Apr 2026 05:48:46 +0000

          From comp.compilers

Related articles
Paper: dividing by seven johnl@taugh.com (John R Levine) (2026-04-10)
Re: Paper: dividing by seven anton@mips.complang.tuwien.ac.at (2026-04-13)
Re: Paper: dividing by seven robin51@dodo.com.au (2026-04-19)
Re: Paper: dividing by seven anton@mips.complang.tuwien.ac.at (2026-04-19)
| List of all articles for this month |
From: anton@mips.complang.tuwien.ac.at
Newsgroups: comp.compilers
Date: Mon, 13 Apr 2026 05:48:46 +0000
Organization: Compilers Central
References: 26-04-002
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="89953"; mail-complaints-to="abuse@iecc.com"
Keywords: code, optimize, paper
Posted-Date: 13 Apr 2026 20:08:38 EDT

John R Levine <johnl@taugh.com> writes:
>Optimization of 32-bit Unsigned Division by Constants on 64-bit Targets
>
>Shigeo Mitsunari, Takashi Hoshino
>
>Granlund and Montgomery proposed an optimization method for unsigned
>integer division by constants [3]. Their method (called the GM method in
>this paper) was further improved in part by works such as [1] and [7], and
>is now adopted by major compilers including GCC, Clang, Microsoft
>Compiler, and Apple Clang. However, for example, for x/7, the generated
>code is designed for 32-bit CPUs and therefore does not fully exploit
>64-bit capabilities. This paper proposes an optimization method for 32-bit
>unsigned division by constants targeting 64-bit CPUs. We implemented
>patches for LLVM/GCC and achieved speedups of 1.67x on Intel Xeon w9-3495X
>(Sapphire Rapids) and 1.98x on Apple M4 (Apple M-series SoC) in the
>microbenchmark described later. The LLVM patch has already been merged
>into llvm:main [6], demonstrating the practical applicability of the
>proposed method.
>
>https://arxiv.org/abs/2604.07902


In a paper [ertl19kps] I looked at full-width division by multiplying
with the double-width (128 bits on 64-bit targets) reciprocal , and
that gave a small reduction in latency (over the conventional approach
of using a single-width multiplication with corrections) in some
unsigned cases (e.g., n/7). I did not look into half-width division,
because the programming language I am interested in does not have
half-width arithmetics, but of course one can apply the same idea in
this context by multiplying with the full-width reciprocal. It's
interesting that the benefits are much higher in this context.


@InProceedings{ertl19kps,
    author = {M. Anton Ertl},
    title = {Integer Division by Multiplying with the
                                    Double-Width Reciprocal},
    crossref = {kps19},
    pages = {75--84},
    url = {https://www.complang.tuwien.ac.at/papers/ertl19kps.pdf},
    url-slides = {https://www.complang.tuwien.ac.at/papers/ertl19kps-slides.pdf},
    abstract = {Earlier work on integer division by multiplying with
                                    the reciprocal has focused on multiplying with a
                                    single-width reciprocal, combined with a correction
                                    and followed by a shift. The present work explores
                                    using a double-width reciprocal to allow getting rid
                                    of the correction and shift.}
}


@Proceedings{kps19,
    title = {20. Kolloquium Programmiersprachen und Grundlagen
                                    der Programmierung (KPS)},
    booktitle = {20. Kolloquium Programmiersprachen und Grundlagen
                                    der Programmierung (KPS)},
    year = {2019},
    key = {kps19},
    editor = {Martin Pl\"umicke and Fayez Abu Alia},
    url = {https://www.hb.dhbw-stuttgart.de/kps2019/kps2019_Tagungsband.pdf}
}


- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.