Optimizing a Lock-Free SPSC Queue

Mon, 30 Mar 2026 00:00:01 +0530

Yet another SPSC blog! The goal isn’t just to explain a final solution, such as the lock-free, index-cached implementation some of you might already know. Instead, I want to walk through the iterative process: testing different approaches, analyzing the results, and determining how to squeeze out performance.

By walking through the analysis, I hope this process provides a template for tackling similar performance challenges in the future.

The optimization discussions are focused on the x86 architecture. All measurements were taken on an AMD Ryzen 7 6800HS using gcc 13.3.0. This CPU has a single CCX (Core Complex), and the benchmarks use cores 0 and 2 - two distinct physical cores, rather than hyperthreaded siblings.

C21 Blog

Optimizing a Lock-Free SPSC Queue