AI Inference & GPU Performance
NVIDIA's AI Inference path, made beginner-friendly too: it starts from absolute zero (what a model is, what a GPU is) and ramps up gradually. You'll understand inference, measure latency vs throughput correctly (warmup, torch.cuda.synchronize(), percentiles), find whether you're compute- or memory-bound, and accelerate with batching, mixed precision (FP16/BF16 + Tensor Cores), quantization (INT8/FP8), kernel fusion, CUDA Graphs, and TensorRT — up to a Triton-style benchmark capstone.