EXPLAIN ANALYZE output is dense and intimidating. Once you can read it, most slow-query investigations finish in minutes. The patterns we keep seeing.
For a long time my approach to slow Postgres queries was: see the query, guess at an index, add it, hope. Sometimes that worked. Often it didn't, and I'd find myself with three useless indexes plus the original problem. Reading the query plan changed how I debug — once you can interpret EXPLAIN ANALYZE output, the conversation goes from "this query is slow" to "this specific step in the plan is doing N seq scans we didn't want," which has an obvious fix.
This post is the patterns I keep seeing across production Postgres workloads.
EXPLAIN shows the plan Postgres would run. EXPLAIN ANALYZE runs the query and shows actual timings + row counts. Always use ANALYZE in real diagnosis — the planner's estimates can be wildly off, and the actual numbers are what matter. The cost is that ANALYZE executes the query, so don't run it on a destructive operation without a transaction you can roll back.
BEGIN;
EXPLAIN (ANALYZE, BUFFERS) DELETE FROM users WHERE created_at < '2020-01-01';
ROLLBACK;
BUFFERS is the unsung hero — adds I/O statistics so you can see how much of the plan was reading from cache vs disk.
Plans are trees, output bottom-up. The bottom-most node is the leaf (a scan); each level above combines, filters, sorts, joins. The cost numbers cost=A..B are the planner's startup-cost..total-cost estimates; actual time=A..B is real wall-clock for that node.
The numbers to actually read:
actual time at each node — how long that node took.rows at each node — how many rows came out. If actual rows >> estimated rows, the planner has bad stats.loops — how many times the node was executed. Inside a nested loop join, the inner node runs once per outer row. The reported time is per loop; multiply by loops for the real cost.hit is cache; read is disk. Lots of read on a hot query is a problem.The slow part of the query is the node with the highest actual time × loops. Find that and you've found the bottleneck.
A handful of plan shapes account for ~80% of the slow-query investigations.
Seq Scan on orders (cost=0.00..18452.32 rows=10000 width=1000) (actual time=120.5..234.8 rows=8 loops=1)
Filter: (customer_id = 12345)
Rows Removed by Filter: 999992
Translation: scanned a million rows, kept 8. The planner went seq scan because either:
customer_id.Most common cause: simply no index. Fix: CREATE INDEX ON orders (customer_id). Re-run, confirm the plan switched to Index Scan.
Index Scan using orders_customer_id_idx on orders (actual time=0.1..1850.3 rows=145000 loops=1)
Index Cond: (customer_id = 12345)
The index lookup is fine, but it returned 145k rows that the next step has to process. Probably you need a composite index that covers more of the WHERE clause:
-- Was: WHERE customer_id = ? AND status = 'open'
CREATE INDEX ON orders (customer_id, status) WHERE status = 'open';
-- Partial index: only contains open orders, smaller and faster.
Nested Loop (actual time=0.1..14523.8 rows=10 loops=1)
-> Seq Scan on users (actual time=0.0..52.1 rows=50000 loops=1)
-> Index Scan using orders_user_id_idx (actual time=0.2..0.3 rows=0 loops=50000)
50,000 outer rows, each one looking up orders. Even though the inner index scan is fast per loop (0.3ms), the total is 50,000 × 0.3ms = 15 seconds. Solution: a hash join instead of nested loop, or different join order.
This usually fixes itself when the planner has good statistics (ANALYZE the table) and the join's selectivity is correct. Sometimes you have to rewrite the query to nudge the planner.
Sort (actual time=2350.0..3000.0 rows=500000 loops=1)
Sort Method: external merge Disk: 250000kB
The sort spilled to disk because it didn't fit in work_mem. Two fixes:
SET LOCAL work_mem = '256MB' before the query. Don't set this globally too high; work_mem is per-sort-node, so a query with 4 sorts uses 4× work_mem.Hash Join (actual time=...)
-> Seq Scan on big_table
-> Hash (Buckets: 16384 Batches: 32)
-> Seq Scan on other_table
Batches: 32 means the hash had to spill to disk in 32 chunks. Same fix as sort: raise work_mem, or avoid the hash join.
-- This won't use an index on email:
WHERE LOWER(email) = 'foo@example.com'
-- This will, if you have a functional index:
CREATE INDEX ON users (LOWER(email));
-- Then the query above uses it.
-- Or rewrite to not need the function:
WHERE email = 'foo@example.com' COLLATE "C" -- if your emails are stored lowercase already
Every function call on the column side of a WHERE clause prevents straight index use unless you have a matching functional index. Common offenders: LOWER, DATE_TRUNC, EXTRACT, type casts (field::int = ...).
Patterns that turned out to be near-universal wins on our shape of data:
Indexes on every foreign key column. Postgres doesn't add these automatically. If you have posts(author_id), you need an index on author_id. The Prisma client doesn't always add them either — check.
Partial indexes for "active" rows. Most tables have a status or deleted_at column. Most queries filter on the active subset. A partial index (WHERE status = 'active' or WHERE deleted_at IS NULL) is smaller, faster, and the planner uses it for matching queries automatically.
Composite indexes ordered by selectivity. (tenant_id, status, created_at) for a multi-tenant app. Tenant first (high selectivity, always in the WHERE clause), status next (filters within a tenant), created_at last (for ORDER BY).
BRIN indexes on append-only tables. For very large tables with naturally clustered data (events ordered by time), a BRIN index is 100–1000× smaller than a B-tree and works well for range scans. We use these for log-like tables.
Expression indexes for case-insensitive search. LOWER(email) is a common one; storing the lowercased version is faster but feels redundant. A functional index gives you the same speed without the storage redundancy.
A few patterns I see teams overdo:
Indexing every column. Indexes have write costs. A table with 8 indexes pays 8× write amplification. Pick the queries that actually matter and index for those.
Trusting EXPLAIN without ANALYZE. The planner's estimates can be wildly off. Always ANALYZE for diagnosis.
Adding indexes during incident response. CREATE INDEX blocks writes by default. Use CREATE INDEX CONCURRENTLY in production to avoid lock-storms. The non-CONCURRENT version is for off-hours maintenance only.
Forgetting to VACUUM ANALYZE after big data changes. After a large insert or delete, stats are stale; the planner makes bad choices. VACUUM ANALYZE table_name updates the stats.
When a query is slow:
EXPLAIN (ANALYZE, BUFFERS) the query.actual time × loops node.This takes 5–15 minutes for most slow queries once you've done it a few times.
Reading query plans is one of those skills that takes a few hours to learn and pays back for years. The patterns above cover the bulk of slow-query investigations. Once you can spot them in the plan, the conversation with the database changes from "this is slow somehow" to "this specific step costs X and we can fix it by Y," which is where every productive debugging session ends up.
Get the latest tutorials, guides, and insights on AI, DevOps, Cloud, and Infrastructure delivered directly to your inbox.
Argo CD ships your manifests; Argo Rollouts ships them gradually with automated quality gates. The setup, the analysis templates that earn their place, and what we measure.
Single-provider LLM apps fail when the provider does. Multi-provider routing isn't just resilience — it's also a cost lever. The patterns we run.
Explore more articles in this category
Version-pinned modules across many repos. The release process, semver discipline, and the breaking-change communication that keeps a shared registry sane.
Backups are easy. Restores are hard. The quarterly drill we run, what's failed during it, and the discipline that makes "we have backups" actually mean something.
Replication is the foundation of database HA. What we monitor, how we practice failover, and the gotchas that show up only when you actually fail over.