Free memory is a lie and load average doesn't see memory stalls. How Pressure Stall Information gives you a direct, early signal of memory contention — and how we wired it into alerts and autoscaling.
The OOM killer is the worst way to find out you're low on memory: it picks a process, kills it, and leaves you reconstructing what happened from a terse kernel log line. By then the incident already happened. The frustrating part is that the classic memory metrics don't warn you — "free memory" looks fine right up until it doesn't, because Linux aggressively uses memory for page cache. Pressure Stall Information (PSI) is the signal that actually tells you the system is struggling for memory, with enough lead time to act.
On a healthy Linux box, free memory is usually near zero — and that's correct. The kernel fills unused RAM with page cache (recently-read files) because idle RAM is wasted RAM. That cache is reclaimable on demand, so "low free memory" is normal and says nothing about pressure.
The number that matters isn't how much memory is free; it's how much time the system spends stalled waiting for memory — blocked on reclaim, thrashing the page cache, swapping. That's exactly what PSI measures.
$ cat /proc/pressure/memory
some avg10=4.32 avg60=2.10 avg300=0.95 total=128934821
full avg10=1.20 avg60=0.55 avg300=0.20 total=43219003
The numbers are percentages of wall-clock time over 10/60/300-second windows:
some: the fraction of time at least one task was stalled waiting for memory. A leading indicator — the system is starting to work for memory.full: the fraction of time all non-idle tasks were stalled simultaneously. This is real trouble — nothing is making progress because everyone's waiting on memory.some avg10=4.32 means: over the last 10 seconds, 4.3% of the time something was stalled on memory. Rising some is your early warning. Non-trivial full means you're already in pain and an OOM kill may be near.
Load average lumps everything together and includes CPU-bound and I/O-bound tasks; it can't tell you why tasks are runnable-but-waiting. Free memory tells you about page cache, not pressure. PSI gives you a direct, attributable, time-based measure of memory contention specifically. It moves before the OOM killer fires, which is the entire point.
Under cgroup v2, each cgroup exposes its own PSI — so you can attribute memory pressure to a specific container/service, not just the whole node:
$ cat /sys/fs/cgroup/system.slice/myapp.service/memory.pressure
some avg10=22.1 avg60=15.4 avg300=8.9 total=...
This is how you find which workload is causing node-level pressure. The node's /proc/pressure/memory says "something's struggling"; the per-cgroup file says "it's myapp, and it's stalled 22% of the last 10 seconds." On Kubernetes nodes this is gold for distinguishing a noisy-neighbor problem from a genuinely under-provisioned node.
We alert on sustained some and any meaningful full:
# crude exporter loop; in practice use a PSI collector / node_exporter
read_psi() {
awk '/^some/ {print $2}' /proc/pressure/memory | cut -d= -f2
}
# alert: some avg60 > 10 for 5m → warning (act before OOM)
# alert: full avg10 > 5 → critical (OOM imminent)
The thresholds are workload-specific — batch jobs tolerate more pressure than latency-sensitive services — but the shape is: warn on rising some (you have time), page on full (you don't).
PSI is only useful if it drives an action with enough lead time:
some avg60 crosses a threshold, rather than waiting for OOM kills to signal capacity shortage.Stop asking "how much memory is free?" — it's nearly always near zero and tells you nothing. Start asking "how much time are we stalled waiting for memory?" PSI answers that directly, per-cgroup, with a leading some signal and a critical full signal. Wire it into your alerts and autoscaling and you trade the OOM killer's after-the-fact verdict for a warning you can act on while there's still time.
Get the latest tutorials, guides, and insights on AI, DevOps, Cloud, and Infrastructure delivered directly to your inbox.
Node upgrades, autoscaler scale-downs, and spot reclaims all drain nodes. Without PDBs they can take all your replicas at once. The budgets, probes, and graceful-shutdown handling that keep voluntary disruptions invisible to users.
A long, stable system prompt re-billed on every request is money on fire. How prompt caching works, where the cache boundary belongs, and the structuring discipline that got us a big cost and latency cut without changing behavior.
Explore more articles in this category
When the service is slow and the network is suspect, these are the tools we reach for, in this order, with the exact flags that find the answer.
io_uring replaces epoll for new high-throughput services. The patterns that earn their place, the gotchas in older kernels, and where we'd still pick epoll.
cpu.shares vs cpu.cfs_quota_us vs memory.max — the cgroup mechanics behind Kubernetes resource limits, and the surprises that explain the weird symptoms you've seen.