bpftrace one-liners replace strace, perf top, and a half-dozen ad-hoc debugging scripts. The patterns that actually earn their place when you're troubleshooting at 2 AM.
The biggest behavior change since we started running bpftrace in production isn't "we now use eBPF." It's that we stopped reaching for strace for half the things we used to use it for. bpftrace one-liners do the same job — usually faster, often with less impact on the target process, and you can leave them running for hours instead of minutes. This post is the bpftrace patterns we actually run, with the production reasons each one earns its keep.
strace works by attaching ptrace to the process, intercepting every syscall. That gives you complete visibility but at high cost:
bpftrace uses kernel tracepoints and kprobes via eBPF. The kernel filters events before they hit userspace, so you can watch every syscall on every process and only the matching events get streamed out. Overhead is typically <1% even on busy systems.
The trade: bpftrace can do less than strace in some specific cases (it doesn't pretty-print every syscall's arguments out of the box; you write the formatting yourself for non-trivial syscalls). But for the 90% of "what is this process actually doing" debugging, bpftrace is the right tool.
bpftrace -e 'tracepoint:raw_syscalls:sys_enter /pid == 12345/ {
@[ksym(args->id)] = count();
}'
Press Ctrl-C; bpftrace prints the histogram of syscalls. Equivalent to strace -c -p 12345 but with negligible overhead. Useful when an app is doing something unexpected — you can immediately see "oh, it's doing 50k read syscalls per second."
bpftrace -e 'tracepoint:syscalls:sys_enter_openat /comm == "myservice"/ {
printf("%s: %s\n", comm, str(args->filename));
}'
Filter by process name (comm). Useful for "what config files does this thing actually read at startup?" or "why is it reading from /tmp constantly?"
bpftrace -e 'tracepoint:tcp:tcp_retransmit_skb {
@[ntop(args->saddr), ntop(args->daddr)] = count();
}'
Builds a histogram of (source, destination) pairs that are retransmitting. Catches network issues that don't show up in latency metrics until much later — and points at which connections are affected. We caught a cross-AZ network blip with this within minutes of a customer reporting odd timeouts.
bpftrace -e 'kprobe:vfs_read {
@start[tid] = nsecs;
}
kretprobe:vfs_read /@start[tid]/ {
@latency = hist((nsecs - @start[tid]) / 1000);
delete(@start[tid]);
}'
Histogram of vfs_read latencies in microseconds. Tells you the distribution of how long file reads take. The script accumulates events; press Ctrl-C to dump the histogram. If you see a bimodal distribution (most reads <50µs but a tail of 50ms+), that's a hot disk with occasional stalls.
bpftrace -e 'tracepoint:syscalls:sys_enter_execve {
printf("%-8d %s\n", pid, str(args->filename));
}'
Equivalent to execsnoop from the bcc toolkit, in one line. Useful for "what's actually being run on this host?" debugging — catches surprise cron jobs, init scripts, debug processes someone left running.
bpftrace -e 'kprobe:tcp_connect {
@[ntop(((struct sock *)arg0)->__sk_common.skc_daddr)] = count();
}'
Counts outbound TCP connects by destination IP. Useful for "this service is making requests to weird places" debugging. Equivalent to tcpconnect from bcc; the one-liner version skips the bcc dependency.
For a Go service:
bpftrace -e 'uprobe:/path/to/binary:main.handleRequest {
@start[tid] = nsecs;
}
uretprobe:/path/to/binary:main.handleRequest /@start[tid]/ {
@duration_us = hist((nsecs - @start[tid]) / 1000);
delete(@start[tid]);
}'
Histogram of function-level latency. Works for any binary with debug symbols. We've used this to confirm whether a slow endpoint is slow inside the request handler or somewhere downstream (the histogram showed the handler itself was fast — slowness was in a DB call deeper in the stack).
bpftrace -e 'kprobe:finish_task_switch {
@start[tid] = nsecs;
}
kprobe:try_to_wake_up {
@off_cpu_us = hist((nsecs - @start[arg0]) / 1000);
}'
For a process you suspect is blocked on something (lock, IO, sleep). The histogram shows the distribution of how long threads sit off-CPU between context switches. The classic "100% CPU" investigation has a counterpart: "low CPU but nothing's happening" — off-CPU profiling is the tool.
Some things eBPF is overkill or wrong:
Long-term metrics. Prometheus + standard exporters for metrics that need histograms over hours. bpftrace is for ad-hoc investigation, not continuous collection.
Distributed tracing. OpenTelemetry/Jaeger for traces that span services. eBPF can do some of this (Pixie does) but the in-app instrumentation gives you trace context propagation that eBPF can't easily reconstruct.
App logging. Logs are logs. Don't try to trace them with bpftrace.
Production scripts you'd commit. bpftrace one-liners are ad-hoc investigation tools. If you find yourself running the same one daily, formalize it (Prometheus alert, dedicated tool).
A few things to know:
Kernel version matters. Older kernels (pre-5.0) have a smaller bpftrace surface. Most things in this post require 5.x+. Modern distros (Ubuntu 22.04+, Amazon Linux 2023) are fine.
You need root or CAP_BPF. Most of these run as root. On hardened systems, you may need to grant CAP_BPF/CAP_PERFMON explicitly.
Containers. bpftrace runs against the host kernel. Inside a container, you can attach to processes on that host (with the right capabilities). For Kubernetes, you'd typically run bpftrace on the node, not inside a workload pod.
Output volume. A trace that fires on every syscall can be very chatty. Filter aggressively (by pid, comm, or specific args). Otherwise you'll DoS your own terminal.
bpftrace doesn't replace your full observability stack. It replaces the half-dozen tools you reach for when you're staring at a process and asking "what are you actually doing right now." The patterns above are the ones we keep using; everything else is variations on them.
Get the latest tutorials, guides, and insights on AI, DevOps, Cloud, and Infrastructure delivered directly to your inbox.
Wrong SLI metrics mean green dashboards while users churn. The discipline of picking signals that move with what users actually feel, and the ones that look reliable but lie.
Argo CD ships your manifests; Argo Rollouts ships them gradually with automated quality gates. The setup, the analysis templates that earn their place, and what we measure.
Explore more articles in this category
cpu.shares vs cpu.cfs_quota_us vs memory.max — the cgroup mechanics behind Kubernetes resource limits, and the surprises that explain the weird symptoms you've seen.
We migrated most scheduled jobs from cron to systemd timers. The wins, the gotchas, and the cases we kept on cron anyway.
A curated list of shell one-liners that earn their place in real ops work — the ones I reach for weekly, not the trick-shot variety.