Lesson 34: Unified Memory with cudaMallocManaged
Until now we managed two separate copies of every array: a host pointer (say a) and a device pointer (say d_a), moving data between them with manual cudaMemcpy in each direction. Unified Memory removes that duplication: cudaMallocManaged allocates a single pointer valid on both the host and the devi
Instead of two separate notebooks — one on your desk (host) and one on your friend's desk (device) — that you keep copying between by hand, there is one magic notebook. When you need it, it appears on your side; when your friend needs it, it slides over by itself. Very convenient, but the slide itself takes a moment.
- Unified Memory
- A single address space reachable by both host and device through one pointer. The driver migrates pages on demand.
- cudaMallocManaged
- Allocates managed memory and returns one pointer usable on host and device without explicit cudaMemcpy.
- page migration
- When one side touches a page held by the other, the driver moves the page to it. This triggers a page fault and costs time.
- cudaMemPrefetchAsync
- A hint that migrates managed pages ahead of time to a given destination (device or host) to avoid page faults during the kernel.