Lesson 31: Profiling with Nsight
Optimizing without measuring is guessing. Nsight Systems (a timeline overview: kernels, memory transfers, overlaps) and Nsight Compute (a deep dive into a single kernel) give you real numbers instead of gut feelings. The key metrics: achieved occupancy (how many warps are active out of the maximum),
Profiling is like a doctor measuring your pulse, temperature, and blood pressure before prescribing a medicine. Without the measurements he just guesses what hurts; with the numbers he knows exactly what to treat.
- Nsight Systems
- A system-level profiler: a timeline of kernels, memory transfers, and overlaps. Good for finding where time goes overall.
- Nsight Compute
- A single-kernel profiler: occupancy, throughput, stall reasons. Good for a deep analysis of one kernel.
- achieved occupancy
- The ratio of warps actually active to the maximum possible per SM. Low means the GPU is underused.
- memory-bound
- A kernel whose bottleneck is memory bandwidth, not compute. Shows up as high memory throughput and low compute.