MVCC stands for Multi-Version Concurrency Control. It is a concurrency-control technique databases use to let transactions read from consistent snapshots while other transactions are writing, without making readers and writers block each other unnecessarily.


1. The problem MVCC solves

If you didn’t have MVCC:

  • A writer would need to lock the row while updating it.

  • Readers would be blocked until the writer finishes.

  • This kills concurrency in busy systems.

With MVCC:

  • Readers see the row versions that are visible to their snapshot.
  • Writers can create new row versions without making existing readers wait most of the time.
  • The isolation level determines when the snapshot is taken:
    • READ COMMITTED: usually one fresh snapshot per statement.
    • REPEATABLE READ: usually one stable snapshot for the transaction.
    • SERIALIZABLE: a stable snapshot plus extra conflict detection or locking so committed results are equivalent to some serial order.

2. How it works (conceptually)

Instead of treating a row update as a single visible overwrite:

  • The database creates a new version of the row for each change.

  • Old versions remain in storage until no active transaction needs them.

  • Each version is tagged with metadata:

    • Transaction ID (XID): when it was created.

    • Valid range: start and end transaction IDs or timestamps.

When you query:

  • The database returns the latest version visible to your snapshot, based on transaction metadata and the current isolation level.

3. Example in action

Let’s say we have a table:

idname
1Alice

Timeline:

  1. T1 starts under REPEATABLE READ and reads Alice, so it sees "Alice".

  2. T2 updates Alice → "Alicia" and commits.

    • Now storage has two versions:

      • v1: "Alice" (valid until T2 commit)

      • v2: "Alicia" (valid from T2 commit onward)

  3. T1 continues reading and still sees "Alice", because its transaction snapshot was taken before T2 committed.

  4. New transactions see "Alicia".

If T1 were running under READ COMMITTED, its second SELECT could see "Alicia" because each statement can use a fresh committed snapshot. That would be a non-repeatable read.


4. Implementation in different systems

  • InnoDB (MySQL):
    Stores old versions in Undo-log; deletes them during purge when no transaction needs them.

  • PostgreSQL:
    Stores old versions as dead tuples in the table heap; VACUUM eventually cleans them.

  • MongoDB (WiredTiger):
    Keeps old document versions in the snapshot cache until they’re no longer needed.


5. MVCC vs isolation levels

MVCC is an implementation technique. The isolation level is the contract.

Isolation levelMVCC mental modelMain remaining risk
READ COMMITTEDSnapshot per statementNon-repeatable reads and phantom reads
REPEATABLE READSnapshot per transactionWrite skew may still be possible in snapshot-isolation systems
SERIALIZABLESnapshot plus conflict detection / predicate locking / dependency trackingTransactions may be aborted and must be retried

MVCC alone is not enough to guarantee serializability. It can give each transaction a consistent snapshot, but two transactions can still make decisions from snapshots that are individually valid and globally inconsistent.

Example: two doctors each see that two doctors are on call, then each takes themselves off call. Each transaction’s snapshot looked valid, but the final state has no doctor on call. Serializable isolation must prevent or abort one of the transactions.


6. Advantages & trade-offs

Pros:

  • High concurrency — readers don’t block writers.

  • Consistent snapshots for transactions.

Cons:

  • Extra storage for old versions.

  • Needs cleanup (VACUUM in Postgres, purge in InnoDB).

  • More complex transaction metadata handling.