What they are
Undo logs store the before-image of rows so that:

  • A transaction can ROLLBACK its own changes, and
  • Other transactions can see a consistent snapshot (MVCC) while changes are in-flight.

Undo vs Redo (quick contrast)

  • Redo logs: “How to replay committed page changes after a crash.” (durability)
  • Undo logs: “How to rewind row versions for rollbacks and consistent reads.” (isolation)

Where undo lives

  • Stored in undo tablespaces (outside or inside the system tablespace, depending on config/version).
  • Organized into rollback segmentsundo segmentsundo records.
  • Each modified row stores a roll pointer to its undo record, forming a version chain.

How MVCC uses undo (version chains)

Each clustered-record has hidden fields (conceptually):

  • DB_TRX_ID: the transaction that last changed the row.
  • DB_ROLL_PTR: pointer to the undo record holding the previous version.

On UPDATE:

  1. Create an undo record containing the previous values (only the changed columns + metadata).
  2. Write new row version on the page; set DB_TRX_ID to the current transaction; set DB_ROLL_PTR to the new undo record.
  3. Readers with an older snapshot follow the roll pointers (undo chain) until they reach a version visible to them.

On DELETE:

  • The delete creates an undo record with the old values (so old snapshots can still “see” the row).

On INSERT:

  • InnoDB writes insert undo (see below). Inserts are invisible to older snapshots until commit.

Two kinds of undo

  • Insert Undo

    • Purpose: allow rollback of inserts by the inserting transaction.
    • Not needed for MVCC once the transaction commits (older snapshots never need to see rows that didn’t exist).
    • Can be discarded right after commit (eligible for fast purge).
  • Update Undo

    • Purpose: support both rollback and MVCC (older snapshots may still need the old version).
    • Kept after commit until no active transaction can see the older snapshot.
    • Purge thread eventually removes these records.

Purge & History List

  • InnoDB runs background Purge threads that:
    • Walk undo version chains,
    • Remove versions no longer visible to any active snapshot,
    • Reclaim space in undo segments/tablespaces.
  • History list length measures how much older version history exists.
    • If it grows without falling, you likely have long-running transactions (or replicas) preventing purge.

Symptoms of slow purge / long transactions

  • Growing undo tablespace size / history list length
  • Increasing secondary index bloat (lots of churn)
  • Slower consistent reads

Visibility rules (intuitive)

  • A reader sees the newest version whose DB_TRX_ID is committed and older than the reader’s snapshot.
  • If the current row version isn’t visible, InnoDB follows DB_ROLL_PTR to the undo record and reconstructs the prior version, repeating until it finds a visible one (or the row didn’t exist yet).

Rollback flow (simplified)

  • For UPDATE/DELETE: apply update-undo records in reverse to restore the previous state.
  • For INSERT: apply insert-undo by removing the inserted record.

Operational tips

  • Avoid long transactions (idle transactions in READ COMMITTED/REPEATABLE READ keep history alive).
  • Commit in smaller batches during large updates to let purge make progress.
  • Monitor:
    • SHOW ENGINE INNODB STATUS (History list length, purge state),
    • information_schema.innodb_trx (long transactions),
    • Engine-specific metrics (e.g., INNODB_METRICS).
  • Tuning knobs (names vary by version):
    • Purge parallelism (e.g., innodb_purge_threads)
    • Undo tablespace behavior/auto-truncate (engine/version dependent)
  • Schema design:
    • Keep rows lean (big blobs out-of-line) to reduce undo size.
    • Be mindful of hot tables that churn many versions; consider partitioning to localize purge work.

Mini example (step-by-step)

  1. T1 starts; reads row R = {amount=100, trx=5, rollptr=…}.
  2. T2 updates R to {amount=150}:
    • Write undo record with old {amount=100}, link it via rollptr.
    • New row has trx_id=T2.
  3. T1 (older snapshot) reads R: sees trx_id=T2 (invisible) → follows rollptr → reconstructs {amount=100} and returns it.
  4. T2 commits.
  5. Once no snapshot can see {amount=100}, purge removes the undo record and possibly the dead version.

Quick checklist (undo health)

  • ✅ Short-lived transactions; no idle-in-transaction sessions.
  • ✅ History list length stable/declining under steady state.
  • ✅ Purge threads active; undo tablespaces not growing unbounded.
  • ✅ Batch large DML and commit frequently.