Lesson 15: CPU Cache and Memory Locality

The CPU is far faster than main memory. To bridge the gap it uses a cache hierarchy — L1, L2, L3 — storing copies of recently accessed data. When C++ code accesses data in a cache-friendly order, performance can jump by 10–50×. In this lesson we see why row-major loops beat column-major, what a cach

The CPU remembers things you touched recently — read data in the order it sits in memory and the CPU will have the next piece ready before you even ask.

cache line: The unit of transfer between main memory and cache — 64 bytes on most modern CPUs. Any memory access loads the entire cache line containing that address.
false sharing: When two threads write to different variables that share the same cache line, causing unnecessary cache invalidation on each thread.
spatial locality: The tendency of programs to access memory addresses close to each other. Sequential array access exploits spatial locality because the whole cache line is loaded at once.
temporal locality: The tendency of programs to access the same memory address repeatedly over a short time. Loops that reuse the same variable exploit temporal locality.
prefetch: A mechanism where the CPU (or compiler) loads cache lines ahead of time before the code requests them, by detecting sequential access patterns.