The single most useful Postgres extension you might not be using. The queries it surfaces, the indexes it implies, and the operational discipline of reading it weekly.
For years my approach to Postgres performance was: notice slow queries when users complain, EXPLAIN ANALYZE them, add indexes, repeat. It works but it's reactive. pg_stat_statements turns the conversation around — it aggregates every query the database has run, sorted by total cost. Reading it weekly catches issues weeks before they become incidents.
This is the most useful Postgres extension you might not be using. Enabling it takes two minutes; the discipline of reading it pays off forever.
pg_stat_statements normalizes every query the database sees (replacing literals with placeholders) and aggregates statistics per normalized query:
You query the pg_stat_statements view to see the aggregated data. The output is "this query has been called 1.2M times, taken 4 hours of total CPU, averages 12ms per call" — actionable, not just "slow queries."
Two changes:
# postgresql.conf
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.track = all
pg_stat_statements.max = 10000 # how many normalized queries to track
track_io_timing = on # adds I/O timing data (small overhead)
Restart Postgres. Then in your DB:
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
That's it. The view is now populated and updates continuously. On RDS or other managed Postgres, the extension is usually pre-installed; you just need to enable it via parameter groups.
A handful of pg_stat_statements queries we keep in a notes file:
SELECT
substring(query for 100) AS query,
calls,
round(total_exec_time::numeric, 2) AS total_ms,
round(mean_exec_time::numeric, 2) AS mean_ms,
round((100 * total_exec_time / SUM(total_exec_time) OVER ())::numeric, 2) AS percent_total
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;
This is the "where is the database spending its CPU" view. The percent column tells you whether the top 20 are 20% or 95% of the workload. If 5 queries are 80% of the time, those are the queries that matter; optimize them and the database is meaningfully faster.
SELECT
substring(query for 100),
calls,
round(mean_exec_time::numeric, 2) AS mean_ms,
round(max_exec_time::numeric, 2) AS max_ms
FROM pg_stat_statements
WHERE calls > 100
ORDER BY mean_exec_time DESC
LIMIT 20;
Slow queries are different from frequent queries. A 5-second query called 10 times is less urgent than a 50ms query called 1M times — but both matter for different reasons. The WHERE calls > 100 filters out one-off ad-hoc queries (a migration that ran once, a DBA's debugging session).
SELECT
substring(query for 100),
calls,
shared_blks_read,
shared_blks_hit,
round(100.0 * shared_blks_hit / NULLIF(shared_blks_hit + shared_blks_read, 0), 1) AS cache_hit_pct
FROM pg_stat_statements
WHERE shared_blks_read + shared_blks_hit > 10000
ORDER BY shared_blks_read DESC
LIMIT 20;
Queries that read a lot from disk. Low cache hit percentage = the query is reading data that's not in memory, which is slow. High shared_blks_read is often the signature of a missing index.
SELECT
substring(query for 100),
calls,
rows,
round((rows::numeric / NULLIF(calls, 0))::numeric, 2) AS rows_per_call
FROM pg_stat_statements
WHERE calls > 10
ORDER BY rows_per_call DESC
LIMIT 20;
A query returning 50,000 rows per call is suspicious. Possible: it's a batch job (fine), it's missing a LIMIT (probably bug), or it's iterating through a too-large result set when pagination would work better.
A few patterns we keep seeing once teams start reading pg_stat_statements:
One missing index is dominating. Top query is a SELECT FROM orders WHERE user_id = $1. Mean time is 45ms (sounds OK), calls is 2M (uh-oh), total time is 25 hours. Adding an index on user_id drops mean to 0.5ms; total time drops to 15 minutes per equivalent period. The database "feels" faster across everything because we freed up so much CPU.
A query is running way more often than expected. "Why does this query run 500k times an hour?" Investigating reveals an N+1 in the ORM or a polling loop you forgot about. Sometimes the fix is in the code, not the database.
A function is dragging things down. A query that calls a Postgres function in the WHERE clause. Each call evaluates the function; if it's slow, every row evaluation is slow. Refactor or memoize.
A specific query is slow only sometimes. Mean is 50ms but max is 8 seconds. Tail latency issue. Usually a query plan that flips between fast (index scan) and slow (seq scan) depending on parameter values — pg_stat_statements shows the pattern; further investigation reveals the cause.
The discipline:
SELECT pg_stat_statements_reset(); clears the stats. Useful after major optimizations to measure the new baseline cleanly.We've embedded this in a small Slack bot that posts the weekly top-20 to a channel. The conversation becomes "why did this query move up the list?" which is the conversation you want to be having.
Overhead. pg_stat_statements adds ~5-10% overhead in our benchmarks (varies by workload). Worth it for the visibility. If you have a system where every microsecond counts, measure first.
Memory. The extension stores normalized queries in shared memory. pg_stat_statements.max = 10000 is plenty for most workloads; bump higher if your application generates many distinct queries.
Reset on restart. Stats accumulate since last reset (or restart). After a Postgres restart, you'll need traffic before the view is useful.
Doesn't show plans. pg_stat_statements knows about queries but not their EXPLAIN plans. For a specific slow query you find, you still need to EXPLAIN ANALYZE it.
Normalized queries can be confusing. SELECT * FROM users WHERE id = ? is one entry; the actual values are replaced. Sometimes you want to know what values were used — that information isn't here. Pair with log_statement for specific investigation.
A few patterns we tried and dropped:
Real-time alerting on pg_stat_statements changes. Generates noise. The view is meant for periodic review, not minute-by-minute monitoring.
Tracking per-database when we have many. We have one DB per service mostly. If you have many DBs on one instance, you'll want per-database breakdowns.
Trying to act on every entry in the top 50. The top 5-10 usually dominate. Below that, marginal wins. Focus.
pg_stat_statements is the cheapest big upgrade in Postgres operations. Enabling it is a 5-minute task; reading it weekly is a 15-minute habit; the cumulative effect on database health is measured in dropped query times and avoided incidents. If you're running Postgres without it, fix that today.
Get the latest tutorials, guides, and insights on AI, DevOps, Cloud, and Infrastructure delivered directly to your inbox.
io_uring replaces epoll for new high-throughput services. The patterns that earn their place, the gotchas in older kernels, and where we'd still pick epoll.
Vault + Kubernetes auth + Vault Agent Injector. The setup, the failure modes during pod startup, and the patterns that beat raw Kubernetes Secrets.
Explore more articles in this category
Version-pinned modules across many repos. The release process, semver discipline, and the breaking-change communication that keeps a shared registry sane.
EXPLAIN ANALYZE output is dense and intimidating. Once you can read it, most slow-query investigations finish in minutes. The patterns we keep seeing.
Backups are easy. Restores are hard. The quarterly drill we run, what's failed during it, and the discipline that makes "we have backups" actually mean something.