Re: Paper: PR2: Peephole Raw Pointer Rewriting with LLMs for Translating C to Safer Rust

Christopher F Clark <christopher.f.clark@compiler-resources.com>
Fri, 16 May 2025 02:26:57 +0300

          From comp.compilers

Related articles
Paper: PR2: Peephole Raw Pointer Rewriting with LLMs for Translating C to Safer Rust johnl@taugh.com (John R Levine) (2025-05-09)
Re: Paper: PR2: Peephole Raw Pointer Rewriting with LLMs for Translating C to Safer Rust derek-nospam@shape-of-code.com (Derek) (2025-05-13)
Re: Paper: PR2: Peephole Raw Pointer Rewriting with LLMs for Translating C to Safer Rust arnold@freefriends.org (2025-05-14)
Re: Paper: PR2: Peephole Raw Pointer Rewriting with LLMs for Translating C to Safer Rust 643-408-1753@kylheku.com (Kaz Kylheku) (2025-05-14)
Re: Paper: PR2: Peephole Raw Pointer Rewriting with LLMs for Translating C to Safer Rust anton@mips.complang.tuwien.ac.at (2025-05-15)
Re: Paper: PR2: Peephole Raw Pointer Rewriting with LLMs for Translating C to Safer Rust gneuner2@comcast.net (George Neuner) (2025-05-15)
Re: Paper: PR2: Peephole Raw Pointer Rewriting with LLMs for Translating C to Safer Rust christopher.f.clark@compiler-resources.com (Christopher F Clark) (2025-05-16)
Re: Paper: PR2: Peephole Raw Pointer Rewriting with LLMs for Translating C to Safer Rust cross@spitfire.i.gajendra.net (2025-05-16)
Re: Paper: PR2: Peephole Raw Pointer Rewriting with LLMs for Translating C to Safer Rust 643-408-1753@kylheku.com (Kaz Kylheku) (2025-05-16)
| List of all articles for this month |
From: Christopher F Clark <christopher.f.clark@compiler-resources.com>
Newsgroups: comp.compilers
Date: Fri, 16 May 2025 02:26:57 +0300
Organization: Compilers Central
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="73814"; mail-complaints-to="abuse@iecc.com"
Keywords: Rust
Posted-Date: 15 May 2025 20:54:22 EDT

There has been some debate here about how Rust is "safer". And having
written a little bit or Rust, I can explain that a little bit.


The main point of Rust's safety guarantees are around "heap allocated" memory,
not around array bounds checking, although I believe that references to arrays
are bounds checked and that it is more difficult to turn off array bounds
checking in Rust than in Pascal. It is not a compiler option. It has to be done
by declaring a module to be "unsafe" and then it is obvious that that particular
module is responsible for its own checking (and I still don't know whether it
applies to array bounds checking or not) since I have written production code in
Rust and never have written an unsafe module, as it was unnecessary to do so.
The safe code is generally sufficiently expressive and performant that one
doesn't need (in many cases) to write "unsafe" code.


So, assuming that one is writing safe Rust. One gets checking, but does so with
negligible performance impact. It did not impact the SQL engine we wrote in Rust
and we benchmarked it to be certain.


But, now returning to the main point. Rust has a "different" model of dealing
with "heap allocated" memory. It is vaguely akin to Java's garbage collection
model, in that memory continues to exist as long as there are potential
references to that memory. And this is the job of the "borrow checker" to ensure
that at compile time that can be proven to be true. And, for me, the easiest way
to think about it is that Rust treats "heap memory" like it was a stack but it
has coroutines, so their lifetime can be extended beyond a simple stack.


Still in any case, like C ownership conventions, all objects in safe Rust have
an owner and exist as long as that owner says they do. And, you cannot get a
pointer to such an object, except by "borrowing it" from the owner. The borrow
checker enforces that rule and while you have a "borrowed copy" of the object,
the owner cannot get rid of it. Moreover, the borrow checker makes sure that the
code "borrowing" the object stops borrowing it before the owner wants to get rid
of it. You get a compile time error if the borrow checker cannot prove that is
true. And, in the simplest cases, the creation of an object (and its deletion)
are done via scopes, thus making it all very stack-like.


Moreover, beyond simple cases, you need to decorate your object with
"lifetimes". That's one of the ways you can express nontrivial uses of an
object. Fortunately, lots of simple cases are covered and don't explicitly need
lifetimes, e.g. you use an object in a stack-like fashion where you borrow it
(and don't take a pointer to it that can be leaked--pointers that cannot be
leaked are generally ok). If you do take a pointer, that can be leaked, you will
likely need lifetime annotations. And, how does the borrow checker assure that
pointers cannot be leaked (or at least did it in the Rust compiler I used), by
requiring ownership to be hierarchical, such that the owned object is a child of
the owning object (e.g. ownership is a tree, not a DAG, a tree). Thus, you don't
create Rust objects that are general graphs and make the borrow checker happy.
You can make stacks and queues and trees, but not general graphs, not even DAGs
using the base mechanism.


Of course, that's a pretty strict mechanism, so safe Rust has a solution to it.
It has reference counted pointers (i.e. ones that one can garbage collect).
Those let you make DAGs. When you "borrow" one of those the count is incremented
and stop borrowing it, the count is decremented and upon the count becoming
zero, the object is freed. Not my favorite garbage collecting scheme, but it is
"safe"


And, if you want truly circular links, there are "weak references" in addition.
You cannot directly access an object through a weak reference. You need to write
code that promotes it to a strong reference to access it, and that code performs
the checking to be sure the object exists.


This is not all of Rust's safety guarantees. Objects in Rust are also immutable
by default. You cannot just borrow an object and mutate it. You must explicitly
borrow a mutable copy, from an owner (or borrower) who themselves has a mutable
copy. Moreover, while your code has a mutable copy, it has an exclusive copy of
the object, no one else can get a copy from that owner. You can pass down to
your childrem immutable copies or your mutable copy. But, if I recall correctly,
you cannot mutate the object while they have "borrowed" it.


All of this, means that Rust code is written in a more "functional programming
style". You don't generally make an array and mutate it. You make a new copy of
the array with your changes. And while that may seem inefficient. There are many
algorithms that work well in the regime. Moreover, if the Rust compiler can
determine that your code is safe, it can eliminate making copies and do in place
modification.


In my opinion, this makes Rust code more challenging to write, but it does live
up to its goal of making the code "safer". You simply cannot easily write
"unsafe" code. The compiler simply refuses to compile it. And, my guess is
that's why only a small percentage of C code can be turned into *safe* Rust. So
many C idioms don't enforce the safe Rust rules. They allow mutating objects in
place. They allow passing pointers to places that don't enforce the lifetime
rules. They don't require programmers to check that pointers to objects point to
valid objects. You cannot compile any of those things in a safe Rust module.
It's not just bounds checking. It's limiting programmers to code that the
compiler can prove is safe and not compiling anything the compiler cannot prove
is safe.


--
******************************************************************************
Chris Clark email: christopher.f.clark@compiler-resources.com
Compiler Resources, Inc. Web Site: http://world.std.com/~compres
23 Bailey Rd voice: (508) 435-5016
Berlin, MA 01503 USA twitter: @intel_chris
------------------------------------------------------------------------------


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.