Lesson 14: Troubleshooting capstone
You have reached the final lesson, and in it we tie everything together into one tool you will use again and again: an orderly diagnosis flow — a runbook. When something breaks in the cluster, it is tempting to panic and guess. Instead, we work like detectives: first gather evidence, only then accus
Troubleshooting is like being a detective: you do not accuse anyone before gathering evidence. First see who looks suspicious (get), read the testimony (describe and logs), and only then decide who is guilty and fix it.
- Troubleshooting
- An orderly process for finding the cause of a failure: gather evidence (get, describe, logs), form a hypothesis, fix, and verify the problem is gone — instead of guessing.
- Runbook
- A fixed, repeatable list of steps you follow when a failure appears. It lets everyone on the team diagnose in the same order and not skip a critical step under pressure.
- CrashLoopBackOff
- A Pod state where the container starts, crashes, and Kubernetes keeps retrying with a growing delay between attempts. The cause is almost always in the app's code or config — read it via logs --previous.
- ImagePullBackOff
- A state where Kubernetes failed to pull the container image: a wrong name or tag, the image is missing from the registry, or credentials are missing. The reason shows in the describe Events.
- Pending
- A state where the Pod was accepted but has not yet been scheduled onto any node — usually because there are not enough free resources (CPU/memory), or no node matches its requirements.
- Events
- A short log of what Kubernetes tried to do to the Pod (scheduled, pulled image, started, failed). It appears at the bottom of kubectl describe pod and is usually the first clue to the cause.