bpftrace one-liners replace strace, perf top, and a half-dozen ad-hoc debugging scripts. The patterns that actually earn their place when you're troubleshooting at 2 AM.

On this page

eBPF Tools for Everyday Ops: bpftrace Patterns We Use

The biggest behavior change since we started running bpftrace in production isn't "we now use eBPF." It's that we stopped reaching for strace for half the things we used to use it for. bpftrace one-liners do the same job — usually faster, often with less impact on the target process, and you can leave them running for hours instead of minutes. This post is the bpftrace patterns we actually run, with the production reasons each one earns its keep.

Why bpftrace beats strace for everyday cases #

strace works by attaching ptrace to the process, intercepting every syscall. That gives you complete visibility but at high cost:

The process slows down 5–20× while strace is attached.
You can't leave it running on a production process for long.
Single-process; you can't easily watch a fleet.

bpftrace uses kernel tracepoints and kprobes via eBPF. The kernel filters events before they hit userspace, so you can watch every syscall on every process and only the matching events get streamed out. Overhead is typically <1% even on busy systems.

The trade: bpftrace can do less than strace in some specific cases (it doesn't pretty-print every syscall's arguments out of the box; you write the formatting yourself for non-trivial syscalls). But for the 90% of "what is this process actually doing" debugging, bpftrace is the right tool.

The patterns we keep coming back to #

1. Which syscalls is a process making, and how often?#

sh.sh

bpftrace -e 'tracepoint:raw_syscalls:sys_enter /pid == 12345/ {
  @[ksym(args->id)] = count();
}'

Press Ctrl-C; bpftrace prints the histogram of syscalls. Equivalent to strace -c -p 12345 but with negligible overhead. Useful when an app is doing something unexpected — you can immediately see "oh, it's doing 50k read syscalls per second."

2. Show every file a process opens #

sh.sh

bpftrace -e 'tracepoint:syscalls:sys_enter_openat /comm == "myservice"/ {
  printf("%s: %s\n", comm, str(args->filename));
}'

Filter by process name (comm). Useful for "what config files does this thing actually read at startup?" or "why is it reading from /tmp constantly?"

3. Trace TCP retransmits in real time #

sh.sh

bpftrace -e 'tracepoint:tcp:tcp_retransmit_skb {
  @[ntop(args->saddr), ntop(args->daddr)] = count();
}'

Builds a histogram of (source, destination) pairs that are retransmitting. Catches network issues that don't show up in latency metrics until much later — and points at which connections are affected. We caught a cross-AZ network blip with this within minutes of a customer reporting odd timeouts.

4. Slow disk operations #

sh.sh

bpftrace -e 'kprobe:vfs_read {
  @start[tid] = nsecs;
}
kretprobe:vfs_read /@start[tid]/ {
  @latency = hist((nsecs - @start[tid]) / 1000);
  delete(@start[tid]);
}'

Histogram of vfs_read latencies in microseconds. Tells you the distribution of how long file reads take. The script accumulates events; press Ctrl-C to dump the histogram. If you see a bimodal distribution (most reads <50µs but a tail of 50ms+), that's a hot disk with occasional stalls.

5. New processes being spawned #

sh.sh

bpftrace -e 'tracepoint:syscalls:sys_enter_execve {
  printf("%-8d %s\n", pid, str(args->filename));
}'

Equivalent to execsnoop from the bcc toolkit, in one line. Useful for "what's actually being run on this host?" debugging — catches surprise cron jobs, init scripts, debug processes someone left running.

6. Show TCP connection establishments by destination #

sh.sh

bpftrace -e 'kprobe:tcp_connect {
  @[ntop(((struct sock *)arg0)->__sk_common.skc_daddr)] = count();
}'

Counts outbound TCP connects by destination IP. Useful for "this service is making requests to weird places" debugging. Equivalent to tcpconnect from bcc; the one-liner version skips the bcc dependency.

7. Histogram of HTTP-like request times via uprobes #

For a Go service:

sh.sh

bpftrace -e 'uprobe:/path/to/binary:main.handleRequest {
  @start[tid] = nsecs;
}
uretprobe:/path/to/binary:main.handleRequest /@start[tid]/ {
  @duration_us = hist((nsecs - @start[tid]) / 1000);
  delete(@start[tid]);
}'

Histogram of function-level latency. Works for any binary with debug symbols. We've used this to confirm whether a slow endpoint is slow inside the request handler or somewhere downstream (the histogram showed the handler itself was fast — slowness was in a DB call deeper in the stack).

8. Off-CPU time (when is a thread blocked, not running?)#

sh.sh

bpftrace -e 'kprobe:finish_task_switch {
  @start[tid] = nsecs;
}
kprobe:try_to_wake_up {
  @off_cpu_us = hist((nsecs - @start[arg0]) / 1000);
}'

For a process you suspect is blocked on something (lock, IO, sleep). The histogram shows the distribution of how long threads sit off-CPU between context switches. The classic "100% CPU" investigation has a counterpart: "low CPU but nothing's happening" — off-CPU profiling is the tool.

What we DON'T use bpftrace for #

Some things eBPF is overkill or wrong:

Long-term metrics. Prometheus + standard exporters for metrics that need histograms over hours. bpftrace is for ad-hoc investigation, not continuous collection.

Distributed tracing. OpenTelemetry/Jaeger for traces that span services. eBPF can do some of this (Pixie does) but the in-app instrumentation gives you trace context propagation that eBPF can't easily reconstruct.

App logging. Logs are logs. Don't try to trace them with bpftrace.

Production scripts you'd commit. bpftrace one-liners are ad-hoc investigation tools. If you find yourself running the same one daily, formalize it (Prometheus alert, dedicated tool).

Operational caveats #

A few things to know:

Kernel version matters. Older kernels (pre-5.0) have a smaller bpftrace surface. Most things in this post require 5.x+. Modern distros (Ubuntu 22.04+, Amazon Linux 2023) are fine.

You need root or CAP_BPF. Most of these run as root. On hardened systems, you may need to grant CAP_BPF/CAP_PERFMON explicitly.

Containers. bpftrace runs against the host kernel. Inside a container, you can attach to processes on that host (with the right capabilities). For Kubernetes, you'd typically run bpftrace on the node, not inside a workload pod.

Output volume. A trace that fires on every syscall can be very chatty. Filter aggressively (by pid, comm, or specific args). Otherwise you'll DoS your own terminal.

What to read next #

eBPF: the future of kernel observability — the broader picture of where eBPF fits in observability
Linux performance tuning for production servers — the host-level tuning these tools help diagnose
Linux process management and monitoring — classic tools (ps, top, lsof) that pair with bpftrace
Network configuration troubleshooting on Linux — the network-side of "what is this process doing"

bpftrace doesn't replace your full observability stack. It replaces the half-dozen tools you reach for when you're staring at a process and asking "what are you actually doing right now." The patterns above are the ones we keep using; everything else is variations on them.

eBPF Tools for Everyday Ops — bpftrace Patterns We Use

eBPF Tools for Everyday Ops: bpftrace Patterns We Use

Why bpftrace beats strace for everyday cases #

The patterns we keep coming back to #

1. Which syscalls is a process making, and how often?#

2. Show every file a process opens #

3. Trace TCP retransmits in real time #

4. Slow disk operations #

5. New processes being spawned #

6. Show TCP connection establishments by destination #

7. Histogram of HTTP-like request times via uprobes #

8. Off-CPU time (when is a thread blocked, not running?)#

What we DON'T use bpftrace for #

Operational caveats #

What to read next #

Stay Updated

SLI Design — Picking Metrics That Actually Correlate With User Experience

Argo Rollouts — Progressive Delivery Beyond Argo CD

More from Linux

SSH Hardening in 2026: Keys, Certificates, and Bastion Patterns

Linux TCP Tuning for High-Throughput Services

Debugging Latency with eBPF: bpftrace One-Liners That Find It

SSH Hardening in 2026: Keys, Certificates, and Bastion Patterns

Linux TCP Tuning for High-Throughput Services

Debugging Latency with eBPF: bpftrace One-Liners That Find It

systemd Timers vs Cron: Migrating Scheduled Jobs the Right Way

The Edge Computing Playbook — What to Run at the Edge (and What Not To)

Observability for Edge Functions — Logs, Traces, and Metrics

About Kiril Urbonas

You might have missed

GitOps with Argo CD: Best Practices for 2025

Prompt Engineering Best Practices: Maximizing LLM Performance

AI Agents in DevOps: From Copilots to Autonomous Automation in 2025