Lesson 19: Memory-mapped I/O vs Buffered I/O
When loading gigabytes of datasets for GPU training, every millisecond of I/O matters. There are three main approaches: buffered I/O (stdio), mmap which maps a file directly into virtual memory, and sendfile for zero-copy networking. At NVIDIA, loading model weights and training batches — mmap gives
Buffered read is like checking out library books and photocopying them before using. mmap is like sitting in the library directly — no copying, reading from the original. sendfile is like mailing a library book directly to a friend without you ever touching it.
- buffered I/O
- I/O through stdio (fread, fwrite) that accumulates data in a user-space buffer before kernel transfer. Reduces system calls but adds an extra data copy.
- direct I/O
- I/O that bypasses the kernel page cache. Enabled with O_DIRECT. Requires buffer alignment to 512 bytes. Useful when the application manages its own cache.
- zero-copy
- An I/O technique that transfers data directly between kernel buffers without copying to user space. sendfile() and mmap are two common implementations.
- sendfile
- System call that transfers data from a file FD to a socket FD directly within the kernel — without copying to user space. Used in Nginx and web servers for static file serving.
- page cache
- A RAM buffer in the kernel that stores recently accessed file contents. Repeated reads from the same file are served from RAM instead of disk.