What is SIMD?

SIMD = Single Instruction, Multiple Data.
It lets a CPU core apply one operation (like add, compare, multiply) to many values at once packed into a wide “vector” register.

Think: instead of doing
c[i] = a[i] + b[i] one element at a time,
SIMD does 8 or 16 of them in one go.

How it works (mental model)

  • The CPU has vector registers (128/256/512-bit).

  • Each register is split into lanes (e.g., eight 32-bit floats).

  • One instruction (e.g., ADD) runs over all lanes simultaneously.

| a0 a1 a2 a3 a4 a5 a6 a7 |
+ | b0 b1 b2 b3 b4 b5 b6 b7 |
= | c0 c1 c2 c3 c4 c5 c6 c7 |   (one SIMD add)

Why it’s fast

  • Fewer instructions and loops.

  • Better use of CPU pipelines and caches.

  • Great for vectorized code: scans, filters, math, encoding/decoding.

Where you see it

  • Analytics engines (columnar DBs): filter price < 100 on 1024 values per batch with a few SIMD compares + a mask.

  • Media/ML: image ops, DSP, linear algebra.

  • Cryptography, compression.

Names you might recognize

  • x86: SSE, AVX2, AVX-512

  • ARM: NEON, SVE

SIMD vs threads

  • SIMD: parallelism within one core across data lanes.

  • Multithreading: parallelism across cores (each core may also use SIMD).

Gotchas

  • Works best when data is contiguous and aligned.

  • Branchy code can hurt; use masks and branchless logic.

  • Speedup bounded by memory bandwidth if data can’t be fed fast enough.