Lesson 2: Why a GPU? Throughput vs Latency

A CPU has a few very powerful cores that finish a single task quickly — that is low latency. A GPU has thousands of simple cores that perform the same operation on lots of data at once — that is high throughput. To add two vectors of length one million, the CPU walks element by element in a loop, wh

A CPU is a few expert chefs cooking one complex dish fast. A GPU is a thousand line cooks, each slicing one tomato — alone each is slow, but together they slice a thousand tomatoes in the time of one.

throughput: The amount of work finished per unit of time. The GPU wins here: many identical operations in parallel.
latency: The time to finish one single operation. The CPU wins here: one strong core finishes one task quickly.
kernel: A function that runs on the GPU. Marked with __global__ and launched by thousands of threads at once.
SIMT: Single Instruction, Multiple Threads — all threads run the same instruction, each on different data.