A 2 AM incident, the autovacuum settings that caused it, and the parameter changes that prevented the next one. The discipline that took our biggest Postgres host from periodic stalls to steady.
Postgres autovacuum is one of those background subsystems you don't notice until it bites you. Our biggest write-heavy database stalled hard one night — query p99 went from 30 ms to 4 s for about twelve minutes — and the culprit was a single table autovacuum had been losing ground on for weeks. The next morning we changed our defaults. This post is what we changed, why, and the dashboards that now catch the same issue earlier.
Postgres uses MVCC: every UPDATE leaves a "dead" row version behind. DELETE is the same — the row is marked dead, not removed. Autovacuum is the background process that reclaims those dead rows so the table doesn't grow unboundedly, the indexes stay efficient, and the visibility map stays accurate for index-only scans.
It also runs ANALYZE periodically to refresh planner statistics, and it does a special "freeze" pass on old rows to prevent transaction ID wraparound. Three jobs, one daemon, mostly invisible.
When autovacuum is keeping up, you never think about it. When it isn't, several things start going wrong at once: table bloat grows, index efficiency drops, planner stats go stale, and eventually one of three failure modes happens — a long ACCESS EXCLUSIVE lock, an anti-wraparound emergency vacuum, or a runaway query plan that scans the bloated heap.
The table in question — a high-write event stream — was getting ~5,000 UPDATEs per second across a couple of hot rows during peak load. Two settings combined to cause the stall:
autovacuum_vacuum_scale_factor = 0.2 — meaning autovacuum waits until 20% of the table is dead tuples before kicking in. On a 200M-row table, that's 40M dead rows of work before the daemon even starts.autovacuum_naptime = 1 min — checks every minute whether any table needs vacuuming. Fine for a quiet database, way too lazy for a hot one.By the time autovacuum finally fired on that table, it had hours of accumulated work to do. The vacuum took 20+ minutes and held shared locks that competed with foreground queries. Latency spiked. We paged.
The most important lesson: don't tune autovacuum globally to fix one table. Hot tables and quiet tables want different schedules. Use ALTER TABLE ... SET for per-table overrides.
For the hot table:
ALTER TABLE events SET (
autovacuum_vacuum_scale_factor = 0.02,
autovacuum_vacuum_threshold = 1000,
autovacuum_analyze_scale_factor = 0.01,
autovacuum_vacuum_cost_limit = 2000
);
What changed:
vacuum_scale_factor = 0.02 (was 0.2): vacuum when 2% of rows are dead instead of 20%. Tighter, more frequent, smaller passes.vacuum_threshold = 1000: ignore the percentage on small tables; start vacuuming after 1k dead tuples. Doesn't matter for the big table; defended against the case where this template gets reused on small tables.analyze_scale_factor = 0.01: refresh stats more often. Planner stays accurate with the rapidly changing data distribution.vacuum_cost_limit = 2000 (was 200): autovacuum slows itself down based on a cost budget; raising the limit lets it work faster per pass. Trade: more I/O during the vacuum.After this, autovacuum on that table runs every 5–10 minutes, each pass takes 30–90 seconds, locks barely register on foreground queries.
A smaller set of changes applied globally:
# postgresql.conf
autovacuum_naptime = 15s
autovacuum_max_workers = 6
autovacuum_vacuum_cost_delay = 2ms
naptime = 15s is the big one. The daemon now considers vacuuming every 15 seconds instead of every minute — small change, big effect on hot tables that hit their threshold between checks.
max_workers = 6 (default 3) lets multiple tables vacuum in parallel. With many active tables, the default 3 workers serialize too much.
cost_delay = 2ms (default 2ms, sometimes 20ms on older configs) is the small pause autovacuum inserts to throttle its I/O. We left this at the modern default but explicitly call it out — older clusters often have it at 20ms, which makes vacuum painfully slow under load.
Three dashboards, checked weekly:
Per-table dead tuple ratio. Query pg_stat_all_tables for n_dead_tup / (n_live_tup + n_dead_tup) — flags any table over 10%. New version of the old "autovacuum is keeping up" question, with hard numbers.
Last autovacuum age per table. now() - last_autovacuum for tables with high write volume. Anything older than ~6 hours on a hot table is suspicious.
Autovacuum worker concurrency. pg_stat_activity filtered for backend_type = 'autovacuum worker'. Should always be < autovacuum_max_workers. If we frequently hit the cap, that's a signal we need more workers or more aggressive per-table thresholds.
We graph all three in the database overview dashboard. Single-pane visibility makes drift obvious instead of having to remember to check.
A few patterns that survived this and a couple of subsequent incidents:
Hot tables need explicit attention. The defaults are designed for "average" tables. Production workloads have outliers. Audit the top 10 by write volume; tune them by hand.
Bloat compounds. A table that's 20% dead today is 40% dead next week if autovacuum can't keep up. The work to clean it up grows faster than linear because indexes degrade with the heap.
Vacuum during traffic, not after. The instinct to "run a manual vacuum at 3 AM when traffic is low" backfires — vacuum during peak is mostly fine if it's small and frequent; vacuum after-the-fact takes an hour and locks things up. Better to bleed the dead rows continuously.
Watch for anti-wraparound vacuums. If autovacuum_freeze_max_age is hit, Postgres triggers an emergency vacuum that ignores all the cost-limit settings and locks the table for the duration. We've never hit this in prod but the post-mortems we've read about it are grim. The dashboards above catch the conditions long before this kicks in.
Cluster-wide settings are a backstop, not the answer. Per-table overrides are how you actually solve this. Inheritance from global defaults is fine; the global defaults should be conservative.
A few settings we left at defaults:
maintenance_work_mem stays at the cluster's general value (we use 256 MB). Bumping it speeds vacuum but eats RAM from query planning.autovacuum_freeze_max_age stays at default 200M. The anti-wraparound emergency only fires if vacuum has been completely failing for a long time. The dashboards catch the conditions much earlier.VACUUM ANALYZE jobs. Decided against — autovacuum tuned correctly handles it; manual jobs are a leaky abstraction over the daemon's job.Adjacent posts in the corpus:
Autovacuum is one of those subsystems where the default settings work for the median case and quietly break for the long tail. Tuning takes an hour per important table and pays off for years. The dashboards keep you honest about whether the tuning is actually holding.
Get the latest tutorials, guides, and insights on AI, DevOps, Cloud, and Infrastructure delivered directly to your inbox.
We've shipped all three patterns to production. They're not interchangeable. Here's the framework we now use to decide which approach fits a given task.
How we run OpenTelemetry across ~40 services. The instrumentation that earns its place, the patterns we abandoned, and what tracing actually catches that metrics don't.
Explore more articles in this category
Backups are easy. Restores are hard. The quarterly drill we run, what's failed during it, and the discipline that makes "we have backups" actually mean something.
Replication is the foundation of database HA. What we monitor, how we practice failover, and the gotchas that show up only when you actually fail over.
Why Postgres connection limits bite at unexpected times, the pooling layer we put in front, and the pool-mode tradeoffs we learned the hard way.