Lesson 9: 2D Grids & Blocks

Matrices and images are naturally two-dimensional, so CUDA lets you organize threads in a 2D grid. Instead of a single x dimension, we have threadIdx.x and threadIdx.y, blockIdx.x and blockIdx.y, plus blockDim.x and blockDim.y. Each thread computes a row and a column: row = blockIdx.y * blockDim.y +

A theater with rows and seats. To find a seat by one running number, you count: each full row contributes width seats, so the seat is row times width plus the column number. Every usher (thread) knows exactly its own row and column.

2D grid: Organizing threads in two dimensions (x and y), convenient for matrices and images where each element has a row and a column.
row and col: row comes from the y dimension (blockIdx.y, threadIdx.y) and col from the x dimension. Together they are the thread's 2D coordinate.
row-major flatten: Turning (row, col) into a flat index: idx = row * width + col, because memory is stored row after row.
dim3: A three-component type (x, y, z) describing block or grid size, for example dim3 block(16, 16).