Lesson 5: Indexing: threadIdx, blockIdx, blockDim
When you launch a kernel over a grid of blocks, each thread must know which array element it is responsible for. CUDA gives every thread three built-in variables: threadIdx.x — the thread's index within its block; blockIdx.x — the block's index within the grid; and blockDim.x — the number of threads
Imagine a parking lot with rows of spots. blockIdx.x is the row number, blockDim.x is how many spots per row, and threadIdx.x is your spot inside the row. To know your overall spot number from the start of the lot: row times spots-per-row, plus your spot in the row.
- threadIdx.x
- The thread's index within its block. Runs from 0 to blockDim.x-1, and resets in every block.
- blockIdx.x
- The block's index within the grid. Runs from 0 to gridDim.x-1.
- blockDim.x
- The number of threads per block (the dimension set at launch). The same for every block in the grid.
- global index
- A unique identifier of a thread across the whole grid: i = blockIdx.x * blockDim.x + threadIdx.x.