Lesson 2: Why a GPU? Throughput vs Latency
A CPU has a few very powerful cores that finish a single task quickly — that is low latency. A GPU has thousands of simple cores that perform the same operation on lots of data at once — that is high throughput. To add two vectors of length one million, the CPU walks element by element in a loop, wh
A CPU is a few expert chefs cooking one complex dish fast. A GPU is a thousand line cooks, each slicing one tomato — alone each is slow, but together they slice a thousand tomatoes in the time of one.
- throughput
- The amount of work finished per unit of time. The GPU wins here: many identical operations in parallel.
- latency
- The time to finish one single operation. The CPU wins here: one strong core finishes one task quickly.
- kernel
- A function that runs on the GPU. Marked with __global__ and launched by thousands of threads at once.
- SIMT
- Single Instruction, Multiple Threads — all threads run the same instruction, each on different data.