Lesson 7: wait() — Reaping Children and Preventing Zombies
When a child process finishes running, it stays in zombie state until the parent process calls wait(). At NVIDIA, inference servers that run multiple worker processes must clean them up with wait(). Zombie processes holding a CUDA context keep VRAM allocated until the parent calls wait() — a common
A zombie process is like an employee who handed in a resignation letter, but the manager hasn't signed the form yet. They are no longer working, but they still occupy a desk. wait() is the manager's signature — only then is the desk (memory) freed.
- Zombie Process
- A process that finished running (exit() was called) but its process table entry still exists because the parent has not called wait(). State Z in ps. A zombie process uses no CPU but holds a PID and can retain GPU resources.
- wait() / waitpid()
- Syscalls that wait for a child process to finish and clean up its record. wait() waits for any child, waitpid(pid, &status, 0) waits for a specific child. Must be called to prevent zombies.
- WIFEXITED
- A macro that checks if a child process terminated normally (via exit/return) and not by a signal. WIFEXITED(status) returns true if termination was normal.
- WEXITSTATUS
- A macro that extracts the exit code from the status returned by wait(). WEXITSTATUS(status) returns the value passed to exit() or returned from main().
- Orphan Process
- A process whose parent process finished running before it. The kernel re-parents orphan processes to init (PID 1) which will call wait() for them. Different from zombie — orphan runs, zombie finished.