Lesson 10: Warps & SIMT: the 32-thread unit

GPU hardware does not schedule threads one by one, but in fixed groups of 32 threads called a warp. All 32 threads in a warp execute the same instruction at the same time, in lockstep — this is the heart of the SIMT model (Single Instruction, Multiple Threads). The warp size is 32 on every existing

Think of rowers in boats of 32. Everyone in one boat pulls an oar at the same instant, in the same rhythm. You cannot change the boat size to 30 or 40 — it is always 32. Your seat in the boat is the lane, and the boat number is the warp.

warp: A group of 32 threads the hardware schedules and runs together in lockstep. The warp size is fixed at 32.
SIMT: Single Instruction, Multiple Threads — all 32 threads in a warp execute the same instruction at the same moment.
lane: A thread's position within its warp: lane = threadIdx.x % 32, a value from 0 to 31.
warpId: The id of the warp a thread belongs to inside the block: warpId = threadIdx.x / 32 (integer division).