When everything seems "slow," a baseline gives you something to measure against. The capture-and-compare workflow we use on every Linux host.

On this page

Practical Guide: Linux Performance Baseline Methodology

When you hit a performance issue on a Linux host, the most useful thing you can have is a recorded "this is what normal looks like." Without it, every metric is suspect and every diagnosis starts from "is this number high?" We adopted a small, consistent baseline-capture workflow about 18 months ago. It's the first thing we run on any newly provisioned host, and the first thing we compare against during an incident.

What "baseline" means in this context #

A snapshot of how a host behaves under representative load. Not a benchmark (synthetic workloads can lie); not a stress test (peak load tells you a different story). A baseline is "here's what this machine's CPU, memory, I/O, and network look like during a normal hour."

The point isn't to memorise numbers. The point is to have something to subtract from the current state when something feels off.

What we capture #

Six datasets, captured over a one-hour window:

CPU: mpstat 60 60 (one minute samples for 60 minutes)
Memory: free -m snapshots every 60s, plus /proc/meminfo parsing
Disk I/O: iostat -x 60 60
Network: sar -n DEV 60 60
Per-process: pidstat 60 60 (top 20 by CPU and memory)
System: uptime start/end, kernel version, sysctl dump

A small script wraps these into a tarball with a timestamp:

bash.bash

#!/bin/bash
set -euo pipefail
HOST=$(hostname)
DATE=$(date +%Y-%m-%dT%H%M%S)
DIR=$(mktemp -d)
trap "rm -rf $DIR" EXIT

(mpstat 60 60 > "$DIR/mpstat.log") &
(iostat -x 60 60 > "$DIR/iostat.log") &
(sar -n DEV 60 60 > "$DIR/sar-net.log") &
(pidstat 60 60 > "$DIR/pidstat.log") &

for i in $(seq 1 60); do
  cat /proc/meminfo > "$DIR/meminfo-$i.log"
  free -m > "$DIR/free-$i.log"
  sleep 60
done &

uname -a > "$DIR/uname.txt"
sysctl -a > "$DIR/sysctl.txt" 2>/dev/null
cat /proc/cpuinfo > "$DIR/cpuinfo.txt"
uptime > "$DIR/uptime.txt"

wait

tar czf "/var/baselines/${HOST}-${DATE}.tgz" -C "$DIR" .
echo "Baseline saved: /var/baselines/${HOST}-${DATE}.tgz"

It runs as a cron job once when a host is provisioned, and again any time the host's role changes (e.g., it gets a new workload, a kernel upgrade, a config change). The tarballs are 2-3 MB each.

When to capture #

Three triggers, in order of importance:

Host provisioning. The first baseline is captured during the host's first hour of normal operation. This is what we compare against later.
Role change. Anything that materially changes what the host does — new workload, scale-up, config tweak — gets a fresh baseline.
Post-upgrade. After kernel or major package upgrades, a fresh baseline lets us spot regressions in OS-level behaviour.

We don't capture continuously. The point isn't telemetry — it's a reference point. Live telemetry (Prometheus) is separate; baseline is for when telemetry isn't enough.

How to compare during an incident #

When something looks wrong, we run the same capture script for 5 minutes (SAMPLES=5 env override on the script). Then we have a comparison-friendly script that pulls a key set of metrics from the most recent baseline and the current capture:

code

COMPARING: 2026-04-25T14:32:11 vs baseline 2026-03-12T09:00:00

CPU usage:        baseline avg %usr=22  current avg %usr=68  ΔΔ +46
                  baseline avg %iowait=2 current avg %iowait=18 ΔΔ +16

Memory:           baseline avail=14GB  current avail=2GB    DROP -12GB
                  baseline cache=8GB    current cache=1GB    DROP -7GB

Disk I/O (sda):   baseline await=4ms    current await=92ms   ΔΔ +88ms
                  baseline %util=12     current %util=99     ΔΔ +87

Network (eth0):   baseline rxbps=80M    current rxbps=85M    stable
                  baseline txbps=20M    current txbps=22M    stable

Top procs by CPU: baseline: nginx (8%), java (4%)
                  current:  java (62%), gc-thread (18%)

The comparison view tells you immediately: this incident is CPU + I/O bound, the Java process is the source, network is unrelated. Without the baseline, you'd be looking at "Java is using 62% CPU" and asking "is that high?" — with the baseline, you know it's 58 percentage points above normal.

What the baseline catches that live monitoring doesn't #

Three categories of issue:

Slow drift. A host's behaviour changes gradually over weeks. Metrics dashboards only show recent windows; a week-over-week shift is hard to see in real-time. A fresh capture compared to the baseline from 6 weeks ago surfaces drift instantly.

Workload-specific norms. "5% iowait" is fine on most hosts and alarming on a host that's normally at 0.5%. The baseline encodes what "normal" means for THIS host's role, not generic.

Post-change regressions. After a kernel upgrade, the new baseline can be compared against the pre-upgrade one. If memory pressure went up 20% with no workload change, that's a kernel-level cost we should know about.

Common analysis patterns #

Some patterns we've seen repeatedly when comparing baselines to current state:

%iowait jump with r/s jump and no w/s change. Indicates increased read load — usually a process that started reading large files, or a cache that was warm went cold (e.g., after a service restart). Look at pidstat -d for the responsible process.

avail memory drop with no process growth. Page cache being squeezed, often by a separate process that's grown its anonymous (heap) memory. The free memory is reclaimable but performance suffers because every read becomes a disk read.

%steal non-zero. You're on a virtualised host and the hypervisor is taking CPU from you. Not your fault, but knowing it stops you from chasing application bugs that aren't there.

Network bandwidth same but pps doubled. Smaller packets — either smaller HTTP responses, more handshake traffic, or a connection storm. Worth investigating.

What we don't bother capturing #

Full perf record traces. Useful when you have a known hot path; not useful as part of a routine baseline.
strace of every process. Massive volume, mostly noise.
Full TCP packet captures. Same.

The baseline is meant to be lightweight enough that it's run automatically and the tarballs are small enough to be kept indefinitely. Anything more invasive should be on-demand during active investigation.

What we missed initially #

Two things, both fixed:

The first baseline script didn't include cpuinfo or sysctl. When we hit a performance issue on a host that turned out to have transparent huge pages disabled (someone had set transparent_hugepage=never weeks earlier), the baseline didn't tell us that. Adding the system-config dump caught the next case before it bit us.

The second: we didn't initially correlate the baseline timestamp with deploys. When an incident hit, the question "what changed since the baseline" required cross-referencing manually. We now annotate the baseline with the git commit deployed at capture time. Comparison shows the diff between the baseline's version and the current version automatically.

When this falls down #

For containerised workloads where the host is shared, baseline at the host level is misleading — your container's behaviour is mixed with neighbours'. We use cgroup-scoped versions of the same script for our K8s nodes (capturing per-cgroup metrics from /sys/fs/cgroup/...). Same idea, different filesystem path.

For ephemeral workloads (lambdas, serverless), the baseline concept doesn't apply — there's no persistent host to baseline. We use vendor-provided telemetry instead.

Time investment #

The script runs in the background; it costs ~1% CPU during the capture hour. The tarball storage cost is negligible (~3 MB × 50 hosts × 4 baselines/year = ~600 MB across the fleet annually).

The compare script is the one we run during incidents. It takes about 90 seconds to run end to end. The cognitive value during a 3 AM incident is enormous — having "this is what normal looked like" available within two minutes changes the whole texture of the investigation.

Closing thought #

Baselines are unsexy. Nobody publishes blog posts about "I have a tarball of mpstat from six weeks ago." But the first time you're at 3 AM staring at a server that "feels slow" and you can run a 90-second compare to see exactly what's drifted, you stop looking unsexy and start looking essential.

Practical Guide: Linux Performance Baseline Methodology

Practical Guide: Linux Performance Baseline Methodology

What "baseline" means in this context #

What we capture #

When to capture #

How to compare during an incident #

What the baseline catches that live monitoring doesn't #

Common analysis patterns #

What we don't bother capturing #

What we missed initially #

When this falls down #

Time investment #

Closing thought #

Stay Updated

Kubernetes Autoscaling: HPA vs VPA vs Cluster Autoscaler

Building Production-Ready AI Applications with LangChain and Docker

More from Linux

SSH Hardening in 2026: Keys, Certificates, and Bastion Patterns

Linux TCP Tuning for High-Throughput Services

Debugging Latency with eBPF: bpftrace One-Liners That Find It

SSH Hardening in 2026: Keys, Certificates, and Bastion Patterns

Linux TCP Tuning for High-Throughput Services

Debugging Latency with eBPF: bpftrace One-Liners That Find It

systemd Timers vs Cron: Migrating Scheduled Jobs the Right Way

External Secrets Operator: One Secrets Workflow Across Clouds

Four Signals That Matter: Choosing SLIs Users Actually Feel

You might have missed

GitOps with Argo CD: Best Practices for 2025

Prompt Engineering Best Practices: Maximizing LLM Performance

AI Agents in DevOps: From Copilots to Autonomous Automation in 2025

About Kiril Urbonas