Lesson 19: Constant Memory & Registers
Two fast kinds of memory sit at the two ends of the hierarchy. At one end are registers: the fastest memory, private to each thread. Every simple local variable inside a kernel (for example an accumulator sum) lives in a register, and accessing it is almost free. But there is a limited number of reg
A register is a personal notepad in each worker's pocket — the fastest, but if you write too much in it, you must store pages in a far-away warehouse (that is spilling). Constant memory is one bulletin board everyone reads from: if they all look at the same notice, it can be read aloud once for all.
- registers
- The fastest memory, private to each thread. Simple local variables live here. Their count is limited per SM.
- register spilling
- When a kernel needs more registers than are available, the surplus is stored in local memory that sits in slow global DRAM — a performance penalty.
- constant memory
- A small (64KB) read-only-from-kernel region, declared with __constant__ and written from the host. It has a dedicated cache.
- constant broadcast
- When all threads in a warp read the same value from constant memory, the read is broadcast from the cache in one step — as fast as a register.