KernelEvolve: Agentic kernel coding for heterogeneous AI accelerators (Meta)

2 points

5 months ago

We’re sharing KernelEvolve, an agentic system we built at Meta to automatically generate and evolve high-performance kernels across heterogeneous AI accelerators.

The core motivation is that modern AI stacks increasingly depend on hand-optimized kernels (GEMM, attention, reductions, fused ops), but writing and tuning them for each hardware target (NVIDIA GPUs, AMD GPUs, custom accelerators like MTIA) does not scale.

KernelEvolve treats kernel programming as a search + evolution problem:

• An LLM generates candidate kernels (e.g., Triton-like code) • Kernels are compiled, benchmarked, and validated on real hardware • Performance feedback is used to evolve better variants over many iterations • The system scales evaluation across large fleets and multiple accelerator types

Unlike one-shot code generation, KernelEvolve continuously improves kernels using closed-loop, hardware-in-the-loop feedback, and can discover non-obvious optimizations that rival or exceed expert-written code.

In the paper we describe:

• The agent architecture and search space design • How we scale kernel evaluation efficiently across heterogeneous accelerators • Case studies showing performance gains over hand-tuned baselines • Practical lessons from deploying this system in production ML workloads

Paper (arXiv): https://arxiv.org/abs/2512.23236 (66 pages)

LinkedIn: https://www.linkedin.com/posts/gangliao_excited-to-share-our-recent-work-on-kernelevolve-activity-7411781675740897280-AQth?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAzsrfsBRed-BvPAGqq9FgvVZ-v6F-sG4SM

We’d love feedback from folks working on compilers, kernels, ML systems, or agentic approaches to code generation.