Lesson 22: Global Memory Races — Why Atomics
Until now almost every kernel you wrote gave each thread its OWN output slot: c[i] = a[i] + b[i], where thread i is the only one that ever touches c[i]. Because no two threads share a destination, the order they run in does not matter — the result is always correct. But some problems force many thre
Imagine 30 kids sharing ONE tally sheet that says 5. Two glance at it together, both see 5, both cross it out and write 6. Two kids counted, but the sheet says 6, not 7 — one count vanished. An atomic operation is like a rule that only one kid may hold the pencil at a time: read, add, write, then pass it on. That way no count is ever lost.
- race condition
- A bug where the result depends on the unpredictable timing and order in which threads touch the same data — it may work one run and fail the next.
- read-modify-write
- x++ is really three steps — read x, add 1, write x. Another thread can slip in between those steps.
- lost update
- Two threads read the same old value and both write back, so one of the increments simply disappears and the count comes out too low.
- atomic operation
- An indivisible read-modify-write that no other thread can interrupt — the fix for a race, e.g. atomicAdd.