We replaced three kernel-level monitoring tools with a small set of eBPF programs. What it bought us, what it cost, and where we still use the old stuff.
We've been running eBPF-based observability in production for about 18 months. It replaced parts of three other monitoring stacks: a kernel-module-based packet inspector, a strace-based debugger, and a syscall auditing daemon. The story is mostly positive but the rough edges are real. This post is what we've learned.
The short version: eBPF lets you run small programs in the kernel, attached to specific events (syscalls, network packets, function entries/exits, perf events). The programs are compiled to a constrained bytecode and verified by the kernel before they run, so they can't crash the kernel or loop forever.
The longer version is that eBPF has become the foundation for a whole new generation of Linux observability tools (and load balancers, and security policies). When we say "we run eBPF," we mean we run programs like:
We had a custom kernel module that inspected packets for a specific protocol mismatch we cared about. It worked, but every kernel upgrade was a small project (recompile, retest, deploy). After two times it broke during a kernel security patch and delayed the patch by a week, we replaced it.
The eBPF replacement attaches to tc (traffic control) hooks, runs the same inspection logic, and survives kernel upgrades within the same major version. The development time was about a week; ongoing maintenance is essentially zero.
When a process was misbehaving in production, we used to run strace -p <pid> to see what syscalls it was making. Two problems: (1) strace ptrace adds significant overhead and slows the process by 5-10x; (2) we had to SSH to the host, find the right pid, etc.
We replaced this with bpftrace one-liners we keep in a runbook:
# Top syscalls for a specific process
bpftrace -e 'tracepoint:raw_syscalls:sys_enter /pid == 12345/ {
@[ksym(args->id)] = count();
}'
# All file opens by any process named "myservice"
bpftrace -e 'tracepoint:syscalls:sys_enter_openat /comm == "myservice"/ {
printf("%s opened %s\n", comm, str(args->filename));
}'
These run with negligible overhead (the kernel does the filtering before any data crosses to userspace). We can leave them running for hours. strace would have crippled the process in minutes.
We used auditd to record specific syscalls (file accesses to sensitive paths, exec of suspicious binaries). It worked, but the throughput was capped — under high load, audit messages got dropped silently.
We replaced it with a custom eBPF program that records the same events and writes them to a per-CPU ring buffer, which a userspace daemon consumes. No drops at our load levels (10x our previous audit volume).
Some things eBPF doesn't replace well:
Long-term metrics. Prometheus metrics from app instrumentation are still the right answer for long-term, dashboard-able performance data. eBPF is great for ad-hoc and high-cardinality, less good for durable time-series at scale.
Distributed tracing. OpenTelemetry-style traces with span context propagated through HTTP headers can't be reconstructed from kernel events alone. You need application-level instrumentation. eBPF can capture syscall-level spans (Pixie does this) but they don't replace app-level traces.
Application logs. The app knows what it's doing better than any kernel observation. Logs go to stdout, get aggregated, end of story.
Tail latency debugging. A service had p99 latency spikes that didn't correlate with any of our metrics. We attached an eBPF program to the syscalls the service made, recording per-syscall latency. The spikes correlated with fsync calls — turned out the disk we were on had occasional sub-second stalls. Switched the data path to async + replicated, problem gone. Couldn't have found this without per-syscall tracing.
Memory leak in a third-party library. A C++ library was leaking memory. We attached an eBPF program to track every malloc/free call in the process and the call stack. After 30 minutes we had a flame graph showing exactly where the leaks originated. Filed a bug upstream with a clear repro.
Network policy enforcement. Cilium (eBPF-based) replaced our iptables-based network policies. The performance is better (no per-packet rule evaluation; eBPF maps are O(1)), and the policies are easier to write. We have ~80 network policies; Cilium runs them with no measurable overhead.
Container ID resolution. Standard Linux tools show pids; in a containerized world, we want container IDs. An eBPF program runs at every syscall entry, looks up the cgroup and translates to a Kubernetes pod name, and emits a labeled event. No more "which pod was that pid?" detective work.
Verifier limits. The eBPF verifier rejects programs it can't prove safe. Loops are restricted; stack space is small; recursion is forbidden. Writing nontrivial eBPF requires working around the verifier in ways that feel artificial. Modern verifier (Linux 5.15+) is much more permissive but the limits are still there.
Kernel version dependencies. eBPF features are added per kernel version. A program that uses a 5.10 feature won't load on 5.4. CO-RE (Compile Once - Run Everywhere) helps but isn't universal.
Debugging eBPF programs. When an eBPF program doesn't work, debugging is painful. The error messages from the verifier are cryptic ("reg type does not match expected"). We rely heavily on bpftool prog tracelog and incremental development.
Tooling fragmentation. bcc, libbpf, bpftrace, frameworks like Tracee, Falco, Pixie, Cilium — each has its own way of doing things. Picking the right tool for a problem requires knowing the landscape.
Our eBPF surface area:
Total nodes: 40. CPU overhead from all the eBPF programs combined: under 2% per node. Memory overhead: under 100MB per node.
Versioning and rollout. We treat custom eBPF programs like any other code: they live in a repo, have tests, go through CI, get deployed via DaemonSets. Rollouts are gradual (5% of nodes first).
Failure mode: an eBPF program that gets unloaded is a fail-open. If our security-monitoring program crashes, we lose visibility but the system keeps running. We monitor for unexpected program unloads as a separate alert.
Resource attribution. When eBPF programs crash a node (rare but happens — usually a verifier bug or out-of-memory in the kernel), it's hard to diagnose because the crash is in kernel space. We've lost a few nodes this way over 18 months. Mitigation: gradual rollouts catch the problem before it spreads.
Multi-tenant clusters. If multiple teams want to run their own eBPF programs, conflicts emerge. Two programs hooked to the same probe with different filters can interact in surprising ways. We have a rule: only the platform team writes eBPF programs that go on production nodes; product teams use the platform team's tooling.
The trends we see:
We're not betting the farm on eBPF for everything. But the ratio of "stuff we used to do with kernel modules or strace" that we now do with eBPF is high — probably 80% — and growing.
Start with bpftrace. Before writing any custom eBPF, learn bpftrace. It's the easiest entry point — the syntax is small, the use cases are immediately useful, and many production debugging tasks need nothing more.
Use existing eBPF-based tools before writing your own. Cilium, Pixie, Falco, Tracee. Most of what teams need is already built. Custom eBPF should be the exception, not the default.
Don't try to use eBPF where it doesn't fit. Long-term metrics, distributed tracing, application logs — eBPF is not the right tool. Use the right tool for each job.
Test on the actual kernel version you'll deploy on. eBPF is more portable than kernel modules but not perfectly portable. Run your verifier locally against the production kernel version.
Treat eBPF like any other code. Repo, tests, CI, gradual rollouts. The fact that it runs in the kernel doesn't change software engineering basics.
eBPF is one of those rare technologies that delivers on its promise. The pain points are real but the upside is large. Eighteen months in, we wouldn't go back.
Get the latest tutorials, guides, and insights on AI, DevOps, Cloud, and Infrastructure delivered directly to your inbox.
We removed the corporate VPN, set up workload identity everywhere, and made every service prove who it is on every call. The actual implementation, with what worked and what we abandoned.
We run a fleet of LLM agents on Kubernetes. They're stateful, bursty, and expensive — none of which K8s defaults are good at. Here's what we changed.
Explore more articles in this category
Generate an SSH key, set up passwordless login, and configure aliases for the servers you use daily — all without copy-pasting yet another long command.
A clear walkthrough of Linux file permissions. Read the funny rwx- letters, change them safely with chmod, fix "permission denied" errors with confidence.
Build a real disk-cleanup script step by step. Learn variables, conditionals, loops, error handling, and the safety preamble that prevents foot-guns.