Lesson 29: Bandwidth & Arithmetic Intensity
Every kernel does two kinds of work: computation (floating-point operations, FLOPs) and data movement (bytes to and from memory). Two simple metrics let us understand what matters. The first is bandwidth: how many bytes the memory moves per second. We compute it as bytes divided by time, and convert
Imagine a truck delivering bricks to a builder. Bandwidth is how many bricks the truck delivers each minute. Intensity is how much work the builder does on each brick. If the builder does little per brick (low intensity), he is always waiting for the truck — memory is the bottleneck. If he does a lot per brick (high intensity), the truck waits for him — compute is the bottleneck.
- bandwidth (GB/s)
- The bytes the memory moves per second. Computed as bytes divided by time, converted to GB/s by dividing by 1e9.
- arithmetic intensity (FLOP/byte)
- The FLOPs performed divided by the bytes moved from memory. Low = memory-bound; high = compute-bound.
- memory-bound
- A low-intensity kernel whose runtime is governed by bandwidth. To speed it up you reduce bytes, not add FLOPs.
- compute-bound
- A high-intensity kernel whose runtime is governed by the math units, so memory is not the bottleneck.