io_uring replaces epoll for new high-throughput services. The patterns that earn their place, the gotchas in older kernels, and where we'd still pick epoll.
epoll was the right async I/O answer on Linux for almost two decades. For socket-heavy network services it's fast, well-understood, and battle-tested. But once your service is doing a mix of network + file I/O + syscalls, epoll starts being one of several syscalls per operation rather than a single mechanism. io_uring changed that — submit operations to a ring buffer, the kernel processes them async, you collect completions. Same interface for everything.
We've run io_uring in production for ~18 months across a handful of services. This post is where it earned its keep, the kernel-version gotchas, and where we still reach for epoll.
epoll asks "which file descriptors are ready?" io_uring says "do these things and tell me when each one is done."
Two ring buffers shared between userspace and the kernel:
The kernel processes the SQ asynchronously. Operations that complete fast (cached reads, small writes) can complete inline; slow ones return later. From userspace, you don't care — you submit, you eventually get a completion.
This is closer to Windows IOCP than to Linux's traditional epoll model. Same kernel, very different programming style.
Mixed workloads. A service doing socket reads AND disk reads AND openat() syscalls used to need epoll + a thread pool for the file work + careful coordination. io_uring handles all three through the same ring. Big simplification.
Batching. Submit 1000 operations in one io_uring_enter syscall. epoll's epoll_ctl is one syscall per FD; the ring buffer amortizes the cost.
Storage-heavy services. For NVMe-backed databases or storage gateways, io_uring eliminates the read/write syscall on the hot path entirely (via SQPOLL — see below).
Per-operation overhead. Cold-path operations like readv show meaningful improvements in CPU per byte once you eliminate the syscall costs.
In our load tests on a network proxy that does TLS termination + disk writes (audit logs):
Real wins, not marginal.
Pure-network services with small sockets. For a simple HTTP API where each request is a few KB and processing is async-DB-heavy, the kernel-side I/O isn't the bottleneck. epoll is mature, well-debugged, supported by every framework.
Older kernels. io_uring features keep landing in new kernels (5.15+, 6.x). Production environments on 5.4 LTS have basic io_uring but miss key features (multishot accept, ring messaging, etc.). epoll works everywhere.
Libraries you don't control. If your dependencies use epoll under the hood (Node's libuv, most Python async libs), the io_uring win is partial — only the parts you wrote can use it.
For us, io_uring is in our newer high-throughput Rust services. Existing Node services keep epoll.
io_uring landed in 5.1. Each subsequent release added features. Practical floor for production use:
We require 5.15 minimum for our io_uring services. Amazon Linux 2023, Ubuntu 22.04, current Bottlerocket all qualify.
A few that earn their place day-to-day.
The classic server pattern: accept connection, read request, write response.
Naive: three separate io_uring_prep_* calls, three completions, three loops through the dispatcher.
With chaining (IOSQE_IO_LINK): the kernel links operations so the next one starts as soon as the previous completes. One submit, one wait, three operations.
// Pseudocode
sqe = io_uring_get_sqe(ring);
io_uring_prep_accept(sqe, listen_fd, addr, addrlen, 0);
sqe->flags |= IOSQE_IO_LINK;
sqe = io_uring_get_sqe(ring);
io_uring_prep_recv(sqe, accepted_fd, buf, sizeof(buf), 0);
sqe->flags |= IOSQE_IO_LINK;
sqe = io_uring_get_sqe(ring);
io_uring_prep_send(sqe, accepted_fd, response, response_len, 0);
io_uring_submit(ring);
Three operations, one syscall. Massive throughput improvement on connection-heavy workloads.
Instead of submitting a new accept after each completion, submit one "multishot" accept that completes once per incoming connection until cancelled. Eliminates resubmit overhead on the hot path.
Requires 5.19+. For a server expecting thousands of connections per second, the savings are real.
For long-lived FDs (your listen socket, your DB connections), pre-register them with the ring. Subsequent operations reference them by index instead of FD; saves a lookup in the kernel.
The win is small per operation, large at scale.
The most aggressive optimization. Set IORING_SETUP_SQPOLL and the kernel spawns a thread that polls the submission queue. You stop calling io_uring_enter for submissions; just write to the SQ. The kernel picks up the work without a syscall.
Trade: that kernel thread spins. CPU cost. Worth it for the highest-throughput services; overkill otherwise.
We use SQPOLL on one service handling >500k requests/sec; nothing else.
Buffer lifetime. Submitted buffers must stay valid until completion. If you submit a read and then realloc the buffer, kernel writes go to free memory. Subtle to debug.
Inflight operations on shutdown. Closing a ring with inflight operations can leak or crash older kernels. We use io_uring_queue_exit only after draining.
Completion ordering. Completions can arrive in any order, even for linked operations after the chain breaks. Code that assumes "this finishes before that" needs explicit serialization.
Resource limits. Each ring uses kernel memory (the rings themselves, registered buffers). Default limits are generous but you can hit them at very high scale. Tune via RLIMIT_MEMLOCK if needed.
glibc syscall numbers. Some older Linux distros have glibc that doesn't know io_uring syscalls. Use liburing (the official wrapper) or wrap the syscalls yourself.
liburing is the official wrapper. Well-maintained.tokio-uring, glommio. tokio-uring integrates with tokio for hybrid epoll+io_uring; glommio is a separate runtime that's all io_uring.We use Rust + glommio for the services where io_uring matters. Everything else stays in its native ecosystem.
A few cases:
For io_uring services:
io_uring is one of those kernel features that doesn't matter until your service is at the throughput where syscall costs matter. For 95% of services, epoll is fine and io_uring is over-engineering. For the 5% where every microsecond of CPU per request shows up in the bill, io_uring is the difference between scaling linearly and not.
Get the latest tutorials, guides, and insights on AI, DevOps, Cloud, and Infrastructure delivered directly to your inbox.
Three caching patterns, three failure modes. The one we use most, the one that bit us, and the rule that decides which pattern fits which workload.
The single most useful Postgres extension you might not be using. The queries it surfaces, the indexes it implies, and the operational discipline of reading it weekly.
Explore more articles in this category
cpu.shares vs cpu.cfs_quota_us vs memory.max — the cgroup mechanics behind Kubernetes resource limits, and the surprises that explain the weird symptoms you've seen.
bpftrace one-liners replace strace, perf top, and a half-dozen ad-hoc debugging scripts. The patterns that actually earn their place when you're troubleshooting at 2 AM.
We migrated most scheduled jobs from cron to systemd timers. The wins, the gotchas, and the cases we kept on cron anyway.