Lesson 6: Two Separate Memories — Host, Device & cudaMalloc

So far you have written a kernel, sized a grid, and given every thread its index. But there is one detail we have not touched yet: where does the data actually live? The CPU (the host) and the GPU (the device) have two completely separate physical memories. A pointer from malloc lives in host memory

The host and the device are like two offices in different cities. Each office has its own filing cabinets, and you cannot reach into the other office's cabinet from afar through the window. cudaMalloc is renting an empty cabinet in the other office (the GPU), and cudaFree is clearing it out at the end. In the next lesson we will learn how to mail documents between the two offices.

host memory: The CPU's memory. A pointer from malloc lives here, and host code can access it directly.
device pointer: A pointer returned by cudaMalloc that points into GPU memory. It must not be read directly in host code.
cudaMalloc: Allocates memory on the device and returns a device pointer through its argument. Like malloc, but the memory lives on the GPU.
cudaFree: Frees device memory allocated by cudaMalloc. The counterpart of free for host memory.