How to read PostgreSQL EXPLAIN ANALYZE output
EXPLAIN shows what the planner intends to do; EXPLAIN ANALYZE runs the query and shows what actually happened. Reading the output well is the single highest-leverage skill in PostgreSQL performance work — and it has a few traps that catch even experienced developers.
The anatomy of a plan node
Every line of a plan is a node in a tree. Rows flow from the innermost (most indented) nodes upward. A typical node looks like this:
Index Scan using orders_customer_idx on orders
(cost=0.43..152.80 rows=42 width=98)
(actual time=0.031..0.512 rows=38 loops=1)
Two groups of numbers, and they mean different things:
cost=0.43..152.80— the planner's estimate, in arbitrary cost units (not milliseconds). The first number is the startup cost before the first row can be produced, the second the total cost for all rows.rows=42(in the cost group) — how many rows the planner expected.actual time=0.031..0.512— measured milliseconds, again startup..total, per loop.rows=38 loops=1(in the actual group) — rows actually returned, averaged per loop.
Trap #1: everything is per loop
When a node sits on the inner side of a nested loop, it may execute thousands of times. PostgreSQL reports its actual time and rows as an average per execution, with loops telling you how many executions happened:
Index Scan using items_order_idx on items
(actual time=0.005..0.021 rows=3 loops=12000)
This node did not return 3 rows in 0.021 ms. It returned roughly 36,000 rows and consumed about 250 ms in total (0.021 × 12,000). Always multiply by loops before deciding whether a node is cheap.
Trap #2: times are cumulative
A node's actual time includes all of its children. If a Sort reports 900 ms and the Seq Scan under it reports 850 ms, the sort itself only cost ~50 ms. To find where time is really spent, you need each node's exclusive time: its total minus its children's totals. This is exactly the arithmetic that gets tedious in a 40-node plan — and the main reason plan visualizers exist.
Trap #3: estimated vs actual rows
The most valuable diagnostic in the whole output is the gap between estimated and actual rows. The plan was chosen based on the estimate — if the planner expected 40 rows and got 400,000, every decision downstream of that node (join strategy, memory sizing, index vs sequential scan) was made on wrong assumptions.
When you see a large mismatch:
- Run
ANALYZEon the involved tables — statistics may simply be stale. - If the misestimate persists on a specific column, raise its sample detail:
ALTER TABLE t ALTER COLUMN c SET STATISTICS 1000; ANALYZE t; - If the condition combines correlated columns (city + zip code, category + brand), the planner multiplies their selectivities as if independent.
CREATE STATISTICSwith thedependencieskind exists for exactly this case.
Use BUFFERS. Always.
EXPLAIN (ANALYZE, BUFFERS) adds a line like:
Buffers: shared hit=1520 read=8943 dirtied=12
hit— 8 KB blocks found in PostgreSQL's shared buffer cache.read— blocks that had to come from outside shared buffers (OS cache or disk).dirtied/written— blocks modified or evicted while the query ran.
Buffer counts explain the difference between "fast in my session, slow in production": the same plan is orders of magnitude slower when its blocks aren't cached. They also make bloat visible — a query that reads 50,000 blocks to return 100 narrow rows is reading mostly dead space or scanning far more than it should.
Other output worth knowing
Rows Removed by Filter— rows fetched and then thrown away. A large number means the access path isn't selective: an index (or a better index) could avoid most of those reads.Sort Method: external merge Disk: 210400kB— the sort didn't fit inwork_memand spilled to disk. Same story for hash joins reportingBatches: 4(more than one batch = spill).(never executed)— the node never ran (for example, the other side of the join produced zero rows). Its numbers are meaningless; don't optimize it.loopswith parallel workers — a parallel node shows one loop per worker; per-loop numbers are per-worker averages.
A reading checklist
- Find the total
Execution Timeat the bottom — that's your budget. - Walk the tree for nodes whose exclusive time (total minus children, times loops) is a big share of it. Usually 1–3 nodes dominate.
- For each hot node, compare estimated vs actual rows. A 10× gap is a planner problem before it's a hardware problem.
- Check
BUFFERS: is it reading far more data than the result justifies? - Only then think about fixes: statistics, indexes, query shape,
work_mem— in roughly that order of likelihood.
