Lesson 15: CPU Cache and Memory Locality
The CPU is far faster than main memory. To bridge the gap it uses a cache hierarchy — L1, L2, L3 — storing copies of recently accessed data. When C++ code accesses data in a cache-friendly order, performance can jump by 10–50×. In this lesson we see why row-major loops beat column-major, what a cach
The CPU remembers things you touched recently — read data in the order it sits in memory and the CPU will have the next piece ready before you even ask.
- cache line
- The unit of transfer between main memory and cache — 64 bytes on most modern CPUs. Any memory access loads the entire cache line containing that address.
- false sharing
- When two threads write to different variables that share the same cache line, causing unnecessary cache invalidation on each thread.
- spatial locality
- The tendency of programs to access memory addresses close to each other. Sequential array access exploits spatial locality because the whole cache line is loaded at once.
- temporal locality
- The tendency of programs to access the same memory address repeatedly over a short time. Loops that reuse the same variable exploit temporal locality.
- prefetch
- A mechanism where the CPU (or compiler) loads cache lines ahead of time before the code requests them, by detecting sequential access patterns.