Lesson 12: Error Handling & Synchronization

Almost every CUDA Runtime call returns a value of type cudaError_t. When all is well the value is cudaSuccess, whose numeric value is 0; any other value indicates an error. A critical point: launching a kernel with <<<>>> is asynchronous — it returns to the host immediately, before the kernel finish

Sending a kernel to the GPU is like dropping a letter in the mailbox: the moment you drop it, you are free to move on — but you don't yet know if it arrived. cudaGetLastError checks whether the address on the envelope is valid (a send-time error), and cudaDeviceSynchronize is like waiting for a delivery confirmation that tells you if something went wrong along the way.

cudaError_t: The type that almost every CUDA call returns. cudaSuccess (value 0) means success; any other value is an error.
cudaGetLastError: Returns and resets the last error code. Called right after a kernel launch to catch launch errors.
cudaDeviceSynchronize: Blocks the host until the GPU finishes all work, and returns errors that occurred during kernel execution.
asynchronous launch: A kernel launch returns to the host immediately, before the kernel finishes. So the launch line cannot report runtime errors.