Lesson 34: Unified Memory with cudaMallocManaged

Until now we managed two separate copies of every array: a host pointer (say a) and a device pointer (say d_a), moving data between them with manual cudaMemcpy in each direction. Unified Memory removes that duplication: cudaMallocManaged allocates a single pointer valid on both the host and the devi

Instead of two separate notebooks — one on your desk (host) and one on your friend's desk (device) — that you keep copying between by hand, there is one magic notebook. When you need it, it appears on your side; when your friend needs it, it slides over by itself. Very convenient, but the slide itself takes a moment.

Unified Memory: A single address space reachable by both host and device through one pointer. The driver migrates pages on demand.
cudaMallocManaged: Allocates managed memory and returns one pointer usable on host and device without explicit cudaMemcpy.
page migration: When one side touches a page held by the other, the driver moves the page to it. This triggers a page fault and costs time.
cudaMemPrefetchAsync: A hint that migrates managed pages ahead of time to a given destination (device or host) to avoid page faults during the kernel.