Paper: Improving Assembly Code Performance with Large Language Models via Reinforcement Learning

John R Levine <johnl@taugh.com>
Mon, 19 May 2025 12:54:22 -0400

          From comp.compilers

Related articles
Paper: Improving Assembly Code Performance with Large Language Models via Reinforcement Learning johnl@taugh.com (John R Levine) (2025-05-19)
| List of all articles for this month |
From: John R Levine <johnl@taugh.com>
Newsgroups: comp.compilers
Date: Mon, 19 May 2025 12:54:22 -0400
Organization: Compilers Central
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="22877"; mail-complaints-to="abuse@iecc.com"
Keywords: optimize
Posted-Date: 19 May 2025 12:55:29 EDT

They prompted some LLMs with C programs and the GCC -O3 assembly, with
feedback when the result was faster and still correct. It seems to me
like asking for trouble, but they claim they got 47% speedup and 96% still
correct code. The paper ends with a contrived example where the LLM
figured out that a C routine could be collapsed into a POPCNT
instruction.




Anjiang Wei, Tarun Suresh, Huanmi Tan, Yinglun Xu, Gagandeep Singh, Ke
Wang, Alex Aiken


Abstract


Large language models (LLMs) have demonstrated strong performance across a
wide range of programming tasks, yet their potential for code optimization
remains underexplored. This work investigates whether LLMs can optimize
the performance of assembly code, where fine-grained control over
execution enables improvements that are difficult to express in high-level
languages. We present a reinforcement learning framework that trains LLMs
using Proximal Policy Optimization (PPO), guided by a reward function that
considers both functional correctness, validated through test cases, and
execution performance relative to the industry-standard compiler gcc -O3.
To support this study, we introduce a benchmark of 8,072 real-world
programs. Our model, Qwen2.5-Coder-7B-PPO, achieves 96.0% test pass rates
and an average speedup of 1.47x over the gcc -O3 baseline, outperforming
all 20 other models evaluated, including Claude-3.7-sonnet. These results
indicate that reinforcement learning can unlock the potential of LLMs to
serve as effective optimizers for assembly code performance.


https://arxiv.org/abs/2505.11480


Regards,
John Levine, johnl@taugh.com, Taughannock Networks, Trumansburg NY
Please consider the environment before reading this e-mail. https://jl.ly


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.