What is SIMD?

SIMD = Single Instruction, Multiple Data.
It lets a CPU core apply one operation (like add, compare, multiply) to many values at once packed into a wide “vector” register.

Think: instead of doing
c[i] = a[i] + b[i] one element at a time,
SIMD does 8 or 16 of them in one go.

How it works (mental model)

The CPU has vector registers (128/256/512-bit).
Each register is split into lanes (e.g., eight 32-bit floats).
One instruction (e.g., ADD) runs over all lanes simultaneously.

| a0 a1 a2 a3 a4 a5 a6 a7 |
+ | b0 b1 b2 b3 b4 b5 b6 b7 |
= | c0 c1 c2 c3 c4 c5 c6 c7 |   (one SIMD add)

Why it’s fast

Fewer instructions and loops.
Better use of CPU pipelines and caches.
Great for vectorized code: scans, filters, math, encoding/decoding.

Where you see it

Analytics engines (columnar DBs): filter price < 100 on 1024 values per batch with a few SIMD compares + a mask.
Media/ML: image ops, DSP, linear algebra.
Cryptography, compression.

Names you might recognize

x86: SSE, AVX2, AVX-512
ARM: NEON, SVE

SIMD vs threads

SIMD: parallelism within one core across data lanes.
Multithreading: parallelism across cores (each core may also use SIMD).

Gotchas

Works best when data is contiguous and aligned.
Branchy code can hurt; use masks and branchless logic.
Speedup bounded by memory bandwidth if data can’t be fed fast enough.

Jonas Notes

Explorer

SIMD