Lesson 30: The Roofline Model
In the previous lesson we met arithmetic intensity (FLOPs/byte) and bandwidth (bytes/second), and distinguished a memory-bound from a compute-bound kernel. The roofline model takes those two and plots them on a single graph that shows, at a glance, the ceiling on performance. The horizontal axis is
Imagine a graph with a step-shaped roof. On the left the roof rises diagonally — there bandwidth limits you. On the right the roof is flat — there compute speed limits you. The corner where the diagonal turns flat is the ridge point. If your kernel falls to the left of the corner, memory is the problem; if to the right, compute is the problem.
- roofline model
- A graph of performance versus arithmetic intensity, with a sloped roof (bandwidth) on the left and a flat roof (peak FLOPs) on the right, showing the performance ceiling.
- ridge point
- The intensity where the sloped roof meets the flat roof: peak FLOPs divided by bandwidth. Left of it is memory-bound, right of it is compute-bound.
- bandwidth roof
- The sloped left part of the roofline. Under it performance = bandwidth times intensity, so kernels there are memory-bound.
- compute roof
- The flat right part of the roofline, at the hardware's peak FLOPs. Kernels under it are compute-bound and cannot exceed the ceiling.