VACUUM, autovacuum and table bloat

Updated July 5, 2026

PostgreSQL never updates a row in place. Every UPDATE writes a new row version and marks the old one dead; every DELETE just marks. The space is reclaimed later, asynchronously, by VACUUM. When that loop works, you never think about it. When it falls behind, tables and indexes silently grow — that's bloat — and every query pays for it.

Why dead rows exist at all: MVCC

Under MVCC (multi-version concurrency control), readers never block writers and vice versa, because each transaction sees a consistent snapshot: old row versions are kept around as long as some transaction might still need them. The moment no active transaction can see a dead version, it becomes garbage — but PostgreSQL doesn't reclaim it inline. That's VACUUM's job.

Two immediate consequences:

An UPDATE-heavy workload writes roughly as much as an INSERT-heavy one, plus index churn.
A single long-running transaction holds back cleanup for the whole database — VACUUM cannot remove any row version that transaction's snapshot might still see, regardless of which tables it touched. A forgotten idle in transaction session on Friday means a bloated database on Monday.

What VACUUM actually does (and doesn't)

Plain VACUUM scans for dead tuples, makes their space reusable for future writes in the same table, updates the free space map and the visibility map, and freezes old tuples to prevent transaction-ID wraparound. It runs alongside normal traffic.
It does not shrink the file on disk (except for the special case of completely empty pages at the very end of the table). A table that once bloated to 50 GB stays 50 GB on disk even if it's 90% empty inside.
VACUUM FULL rewrites the table into a new compact file and returns space to the OS — but takes an ACCESS EXCLUSIVE lock for the whole rewrite. On a big, busy table that's an outage; look at pg_repack for online rebuilds.
ANALYZE (often run together) refreshes planner statistics — a different job that also matters, see reading EXPLAIN ANALYZE.

How autovacuum decides to run

The autovacuum launcher periodically checks each table against a threshold:

vacuum when:  n_dead_tup  >  autovacuum_vacuum_threshold
                             + autovacuum_vacuum_scale_factor × reltuples
-- defaults:  50 + 0.20 × reltuples  (20% of the table)

The default 20% scale factor is fine for small tables and terrible for big ones: a 100-million-row table accumulates 20 million dead rows before autovacuum even starts. For large, hot tables, set a per-table override:

ALTER TABLE orders SET (
  autovacuum_vacuum_scale_factor = 0.01,
  autovacuum_vacuum_threshold    = 10000
);

Autovacuum is also deliberately throttled by cost-based delays (autovacuum_vacuum_cost_delay / autovacuum_vacuum_cost_limit) so it doesn't saturate I/O. On modern hardware the defaults are conservative; if autovacuum runs constantly but never catches up, raising autovacuum_vacuum_cost_limit is usually the first lever.

Detecting bloat before it hurts

SELECT relname,
       n_live_tup,
       n_dead_tup,
       last_autovacuum,
       last_autoanalyze
FROM   pg_stat_user_tables
ORDER  BY n_dead_tup DESC
LIMIT  20;

n_dead_tup large and growing, with last_autovacuum old or NULL → autovacuum isn't keeping up (or a long transaction is pinning the horizon: check pg_stat_activity for old xact_start).
Query-side symptom: buffer counts out of proportion to rows returned in EXPLAIN (ANALYZE, BUFFERS) — reading 40,000 pages for 500 rows means scanning mostly dead space.
Size-side check: compare pg_total_relation_size() over time, or use the pgstattuple extension for an exact dead-space percentage on a suspect table.

HOT updates: the free lunch worth engineering for

If an UPDATE changes only columns that are not indexed, and the new version fits on the same page, PostgreSQL performs a HOT (heap-only tuple) update: no index entries are written at all, and the dead version can be cleaned up cheaply within the page. This is why adding "just one more index" on a frequently updated column can wreck write performance — it turns every HOT update into a full one. A lower fillfactor (e.g. 80–90 for update-heavy tables) leaves room on each page and raises the HOT rate.

Rules of thumb

Never disable autovacuum. If it hurts, it's under-tuned, not unnecessary — and it also protects you from transaction-ID wraparound, which is a database-stopping event.
Kill or timeout idle transactions: idle_in_transaction_session_timeout is cheap insurance.
Tune per-table, not globally: a handful of hot tables usually cause most of the pain.
Bloat you already have won't go away on its own: plain VACUUM stops the growth; reclaiming disk needs VACUUM FULL or an online rebuild.

⚡ Suspect bloat is slowing a query? Run it with EXPLAIN (ANALYZE, BUFFERS) and drop the output into the EXPLAIN Visualizer to see which node is doing the excess reading.