Lesson 15: Diagnosing & Fixing Strided Access
In the previous lesson we saw what coalescing is: when consecutive threads in a warp touch consecutive addresses, one transaction serves them all, and when they jump by a large stride the warp consumes many wasted transactions. Now we learn the practical skill — spotting an uncoalesced access inside
Imagine 4 workers picking apples from rows of trees. At first each worker runs to a different column at a far end of the orchard — 4 long runs. If instead each worker takes the next apple in one tight row, they all work side by side in a single pass. We did not change the orchard — only who-picks-what.
- strided access
- A pattern where consecutive threads jump by a large stride (for example data[threadIdx.x * N]), so their addresses are far apart and each falls in a different memory transaction.
- coalesced vs uncoalesced
- Coalesced: neighboring threads' addresses are contiguous, one transaction for the whole warp. Uncoalesced: the addresses are scattered, the warp consumes many transactions, most of them wasted.
- index-remap fix
- The fix for scattered access: make neighboring threads touch neighboring addresses — give thread t element base + t instead of base + t*N, all in global memory.
- row-major order
- A layout where elements of the same row are neighbors in memory. Traversing by row makes neighboring threads touch neighboring addresses (coalesced); traversing by column jumps by width (scattered).