io_uring replaces epoll for new high-throughput services. The patterns that earn their place, the gotchas in older kernels, and where we'd still pick epoll.

On this page

Linux io_uring — Async I/O Patterns We Use

epoll was the right async I/O answer on Linux for almost two decades. For socket-heavy network services it's fast, well-understood, and battle-tested. But once your service is doing a mix of network + file I/O + syscalls, epoll starts being one of several syscalls per operation rather than a single mechanism. io_uring changed that — submit operations to a ring buffer, the kernel processes them async, you collect completions. Same interface for everything.

We've run io_uring in production for ~18 months across a handful of services. This post is where it earned its keep, the kernel-version gotchas, and where we still reach for epoll.

The mental model #

epoll asks "which file descriptors are ready?" io_uring says "do these things and tell me when each one is done."

Two ring buffers shared between userspace and the kernel:

Submission Queue (SQ). Userspace writes operations here: read this, write that, accept on this socket.
Completion Queue (CQ). Kernel writes results here. Userspace polls/waits for completions.

The kernel processes the SQ asynchronously. Operations that complete fast (cached reads, small writes) can complete inline; slow ones return later. From userspace, you don't care — you submit, you eventually get a completion.

This is closer to Windows IOCP than to Linux's traditional epoll model. Same kernel, very different programming style.

Where io_uring beats epoll #

Mixed workloads. A service doing socket reads AND disk reads AND openat() syscalls used to need epoll + a thread pool for the file work + careful coordination. io_uring handles all three through the same ring. Big simplification.

Batching. Submit 1000 operations in one io_uring_enter syscall. epoll's epoll_ctl is one syscall per FD; the ring buffer amortizes the cost.

Storage-heavy services. For NVMe-backed databases or storage gateways, io_uring eliminates the read/write syscall on the hot path entirely (via SQPOLL — see below).

Per-operation overhead. Cold-path operations like readv show meaningful improvements in CPU per byte once you eliminate the syscall costs.

In our load tests on a network proxy that does TLS termination + disk writes (audit logs):

epoll version: ~1.2M req/s on a 16-core box
io_uring version: ~1.7M req/s, lower CPU per request

Real wins, not marginal.

Where epoll still wins #

Pure-network services with small sockets. For a simple HTTP API where each request is a few KB and processing is async-DB-heavy, the kernel-side I/O isn't the bottleneck. epoll is mature, well-debugged, supported by every framework.

Older kernels. io_uring features keep landing in new kernels (5.15+, 6.x). Production environments on 5.4 LTS have basic io_uring but miss key features (multishot accept, ring messaging, etc.). epoll works everywhere.

Libraries you don't control. If your dependencies use epoll under the hood (Node's libuv, most Python async libs), the io_uring win is partial — only the parts you wrote can use it.

For us, io_uring is in our newer high-throughput Rust services. Existing Node services keep epoll.

Kernel version requirements #

io_uring landed in 5.1. Each subsequent release added features. Practical floor for production use:

5.10+: basic operations, registered file descriptors.
5.15+: multishot accept, fast poll, sufficient for most server workloads.
6.0+: ring messaging, op chaining, mature defaults.

We require 5.15 minimum for our io_uring services. Amazon Linux 2023, Ubuntu 22.04, current Bottlerocket all qualify.

The patterns we use #

A few that earn their place day-to-day.

1. Accept + read + write chain #

The classic server pattern: accept connection, read request, write response.

Naive: three separate io_uring_prep_* calls, three completions, three loops through the dispatcher.

With chaining (IOSQE_IO_LINK): the kernel links operations so the next one starts as soon as the previous completes. One submit, one wait, three operations.

c.c

// Pseudocode
sqe = io_uring_get_sqe(ring);
io_uring_prep_accept(sqe, listen_fd, addr, addrlen, 0);
sqe->flags |= IOSQE_IO_LINK;

sqe = io_uring_get_sqe(ring);
io_uring_prep_recv(sqe, accepted_fd, buf, sizeof(buf), 0);
sqe->flags |= IOSQE_IO_LINK;

sqe = io_uring_get_sqe(ring);
io_uring_prep_send(sqe, accepted_fd, response, response_len, 0);

io_uring_submit(ring);

Three operations, one syscall. Massive throughput improvement on connection-heavy workloads.

2. Multishot accept #

Instead of submitting a new accept after each completion, submit one "multishot" accept that completes once per incoming connection until cancelled. Eliminates resubmit overhead on the hot path.

Requires 5.19+. For a server expecting thousands of connections per second, the savings are real.

3. Registered file descriptors #

For long-lived FDs (your listen socket, your DB connections), pre-register them with the ring. Subsequent operations reference them by index instead of FD; saves a lookup in the kernel.

The win is small per operation, large at scale.

4. SQPOLL — kernel-side polling thread #

The most aggressive optimization. Set IORING_SETUP_SQPOLL and the kernel spawns a thread that polls the submission queue. You stop calling io_uring_enter for submissions; just write to the SQ. The kernel picks up the work without a syscall.

Trade: that kernel thread spins. CPU cost. Worth it for the highest-throughput services; overkill otherwise.

We use SQPOLL on one service handling >500k requests/sec; nothing else.

Gotchas we hit #

Buffer lifetime. Submitted buffers must stay valid until completion. If you submit a read and then realloc the buffer, kernel writes go to free memory. Subtle to debug.

Inflight operations on shutdown. Closing a ring with inflight operations can leak or crash older kernels. We use io_uring_queue_exit only after draining.

Completion ordering. Completions can arrive in any order, even for linked operations after the chain breaks. Code that assumes "this finishes before that" needs explicit serialization.

Resource limits. Each ring uses kernel memory (the rings themselves, registered buffers). Default limits are generous but you can hit them at very high scale. Tune via RLIMIT_MEMLOCK if needed.

glibc syscall numbers. Some older Linux distros have glibc that doesn't know io_uring syscalls. Use liburing (the official wrapper) or wrap the syscalls yourself.

Language support #

C / C++: liburing is the official wrapper. Well-maintained.
Rust: tokio-uring, glommio. tokio-uring integrates with tokio for hybrid epoll+io_uring; glommio is a separate runtime that's all io_uring.
Go: Limited. Standard runtime uses epoll. Third-party libraries exist but aren't widely adopted.
Python: Some experimental wrappers. Most Python async code uses asyncio (epoll under the hood).
JVM: Netty has experimental io_uring backends. Helidon supports it.

We use Rust + glommio for the services where io_uring matters. Everything else stays in its native ecosystem.

When io_uring is the wrong tool #

A few cases:

Network service with 100 req/s. Optimization isn't your bottleneck.
Stateful logic dominates CPU. If your service is computing-heavy, faster I/O doesn't move the needle.
You don't measure. Without before/after benchmarks, you can't justify the complexity.
Old kernels. Production on 5.4? Stick with epoll until you upgrade.

What to monitor #

For io_uring services:

Operations in flight. Indicates load and potential queue overflow.
Submit/completion latency distribution. Sudden tails point at kernel issues.
CPU breakdown (user vs kernel). io_uring shifts work to kernel threads; CPU costs move.
Per-operation cost in benchmarks. Compare against epoll baseline regularly.

What to read next #

Linux performance tuning for production servers — broader system-level perf knobs
eBPF tools for everyday ops — bpftrace patterns — observing io_uring kernel-side
Container resource limits — what they actually do — kernel resource accounting under containers
File system optimization — improving disk performance — adjacent storage-side perf

io_uring is one of those kernel features that doesn't matter until your service is at the throughput where syscall costs matter. For 95% of services, epoll is fine and io_uring is over-engineering. For the 5% where every microsecond of CPU per request shows up in the bill, io_uring is the difference between scaling linearly and not.

Linux io_uring — Async I/O Patterns We Use

Linux io_uring — Async I/O Patterns We Use

The mental model #

Where io_uring beats epoll #

Where epoll still wins #

Kernel version requirements #

The patterns we use #

1. Accept + read + write chain #

2. Multishot accept #

3. Registered file descriptors #

4. SQPOLL — kernel-side polling thread #

Gotchas we hit #

Language support #

When io_uring is the wrong tool #

What to monitor #

What to read next #

Stay Updated

Caching Patterns — Read-Through, Write-Through, Cache-Aside in Practice

pg_stat_statements — Postgres Query Analysis Without Guessing

More from Linux

SSH Hardening in 2026: Keys, Certificates, and Bastion Patterns

Linux TCP Tuning for High-Throughput Services

Debugging Latency with eBPF: bpftrace One-Liners That Find It

SSH Hardening in 2026: Keys, Certificates, and Bastion Patterns

Linux TCP Tuning for High-Throughput Services

Debugging Latency with eBPF: bpftrace One-Liners That Find It

systemd Timers vs Cron: Migrating Scheduled Jobs the Right Way

The Edge Computing Playbook — What to Run at the Edge (and What Not To)

Fixing "Too Many Open Files" in Kubernetes Containers

You might have missed

GitOps with Argo CD: Best Practices for 2025

Prompt Engineering Best Practices: Maximizing LLM Performance

Embedding Models Comparison: Choosing the Right Model for Your Use Case

About Kiril Urbonas