Purge

Goal: Free space and remove dead versions once no active snapshot can see them. The purge subsystem runs continuously in the background.

There are two main kinds of undo and two main purge actions:

Insert-undo (created for INSERTs)
- Only needed to roll back the inserting transaction.
- After commit, no snapshot will ever need to “see” rows that didn’t exist before, so insert-undo is eligible for immediate purge.
Update-undo (created for UPDATE/DELETE)
- Needed for both rollback and MVCC (older snapshots may still need the previous version).
- After commit, update-undo is kept until all active snapshots that could see the old version are gone. Then it’s purged.

Purge acts on two fronts:

Undo-log cleanup: remove obsolete undo records (free/trim undo segments/pages).
Physical deletion: for rows that were delete-marked (logical delete), purge physically removes:
- the clustered-record (if not visible to anyone),
- any secondary index entries that still reference that dead row.

InnoDB maintains a notion similar to a global “purge view” derived from the oldest active read view across the server.
Any undo record / delete-marked row whose creating transaction is older than this global boundary and not visible to any active read view is safe to purge.
If you have a long-running transaction (or replica apply thread) holding an old snapshot, it pins the global boundary → purge can’t advance.

Symptoms of stuck/slow purge:

Growing History list length (amount of older version history).
Undo tablespaces grow and don’t shrink (space may be reusable internally but not returned to OS).
Secondary indexes bloat (rows deleted/updated but not physically removed yet).

DML happens
- UPDATE/DELETE creates update-undo with the before image and links it via DB_ROLL_PTR.
- DELETE also delete-marks the row in the clustered index (logical delete).
- INSERT creates insert-undo so it can be rolled back if needed.
Commit
- Changes become committed; redo ensures crash safety.
- Undo is still present because readers might need old versions (MVCC), and delete-marked rows still exist physically.
Purge scheduling
- Background purge threads look at the history list (queue of undo).
- They compute the current purge boundary from the oldest active read view.
- Work up to that boundary is eligible.
Undo cleanup
- For insert-undo: if the transaction has committed, purge discards it (fast win).
- For update-undo: if no snapshot can see the old version, purge removes the undo record (shortens the version chain).
Physical delete (for delete-marked rows)
- If the delete-marked clustered row is not visible to any active snapshot:
  - Purge removes secondary index entries pointing to it.
  - Purge removes the clustered record itself.
- Freed space goes back to the page/segment free lists (not necessarily the OS file size).
Space reuse/return
- Freed pages are reused internally.
- To actually reduce .ibd file size, you typically need OPTIMIZE TABLE (rebuild) or engine features that can truncate/shrink (varies by version and setting).

REPEATABLE READ: One read view per transaction → longer-lived snapshots → more history retained → slower purge if transactions are long.
READ COMMITTED: One read view per statement → old versions become eligible sooner → purer, faster purge under similar workload (often chosen for write-heavy OLTP with lots of updates).

Old/idle transactions:
- information_schema.innodb_trx for long trx_started times.
History length / Purge progress:
- SHOW ENGINE INNODB STATUS (history list length, purge worker state).
- In 8.0, also performance_schema/INNODB_METRICS for detailed counters.
Undo tablespace growth and secondary index bloat.
Checkpoints / dirty page % (too many dirty pages can slow everything, including purge).

Keep transactions short; avoid “idle in transaction.”
For big batch updates: commit in chunks so purge can catch up.
Consider READ COMMITTED if REPEATABLE READ retention hurts you.
Ensure purge threads are adequate for your workload (innodb_purge_threads).
Partition hot/churn-heavy tables to localize history and reduce bloat.
Schedule periodic maintenance (ANALYZE, rebuild specific bloated indexes/tables if needed).

T1 (REPEATABLE READ) starts at 10:00, runs a SELECT; a read view is created.
T2 updates row R at 10:01 → writes update-undo with old values, sets new DB_TRX_ID, commits.
T1 reads R at 10:05 → its read view is from 10:00, so T1 follows DB_ROLL_PTR to reconstruct the old version and returns it.
Until T1 commits, that old version is needed → purge cannot remove the undo for R.
T1 commits at 10:10 → its snapshot is gone.
Purge sees no remaining snapshot can see the old version → removes update-undo (and, if a DELETE, physically deletes the row and its secondary index entries).

Jonas Notes