Lesson 9: Batching — How to Raise Throughput
Running one image at a time wastes the GPU: its thousands of cores sit nearly empty. Batching is the single most powerful throughput tool — you stack many inputs into one batch, and a single kernel launch fills the GPU. In this lesson we see how it multiplies throughput, and what it costs in latency
Running one item at a time is like sending an empty bus with a single passenger over and over. Batching is filling the bus before it leaves — same trip, many more passengers.
- Batch
- A group of inputs run together in one kernel launch. The batch dimension is the tensor's first dimension.
- GPU utilization
- How much of the GPU's cores are actually busy. A small batch = low utilization; a big batch fills them.