Pod Troubleshooting Commands
This guide provides actionable commands and best practices for troubleshooting pods in Kubernetes clusters (AKS, EKS, GKE, and on-prem). Use these steps for real-life incident response and GitOps workflows.
Common Troubleshooting Commands
List all Pods in all Namespaces:
kubectl get pods --all-namespaces
Check Resource Consumption:
kubectl top pods --all-namespaces
Describe a Pod:
kubectl describe pod <pod-name> -n <namespace>
View Pod Logs:
kubectl logs <pod-name> -n <namespace>
Follow Pod Logs (stream in real-time):
kubectl logs -f <pod-name> -n <namespace>
Exec into a Pod:
kubectl exec -it <pod-name> -n <namespace> -- <command>
Get Events for a Pod:
kubectl get events --field-selector involvedObject.name=<pod-name> -n <namespace>
Check Pod Health (Readiness/Liveness):
kubectl describe pod <pod-name> -n <namespace> | grep -i 'readiness\|liveness\|conditions'
Retrieve Pod IP and Node:
kubectl get pod <pod-name> -n <namespace> -o wide
Restart a Pod:
kubectl delete pod <pod-name> -n <namespace>
Check Pod Status:
kubectl get pod <pod-name> -n <namespace> -o wide
List Pod Events (sorted):
kubectl get events --field-selector involvedObject.name=<pod-name> -n <namespace> --sort-by='.metadata.creationTimestamp'
Verify Pod Affinity/Anti-Affinity:
kubectl describe pod <pod-name> -n <namespace> | grep -i nodeaffinity
Check Resource Requests and Limits:
kubectl describe pod <pod-name> -n <namespace> | grep -i resources
Identify Stuck Pods:
kubectl get events --field-selector involvedObject.name=<pod-name> -n <namespace> --sort-by='.metadata.creationTimestamp' | tail -n 1
Real-Life Troubleshooting Workflow
Identify the failing pod:
kubectl get pods -A | grep -i error
Check pod status and events:
kubectl describe pod <pod-name> -n <namespace> kubectl get events --field-selector involvedObject.name=<pod-name> -n <namespace>
Inspect logs:
kubectl logs <pod-name> -n <namespace>
Check resource usage:
kubectl top pod <pod-name> -n <namespace>
Exec into the pod for deeper inspection:
kubectl exec -it <pod-name> -n <namespace> -- /bin/sh
Review affinity, resource limits, and node assignment:
kubectl describe pod <pod-name> -n <namespace> | grep -i 'affinity\|resources\|node'
If using GitOps: Check if the manifest in Git matches the running pod. If not, investigate drift or failed syncs (ArgoCD/Flux dashboards).
Best Practices (2025)
Always check pod events and logs before restarting or deleting pods
Use
kubectl get events
sorted by timestamp for recent issuesValidate resource requests/limits to avoid OOMKilled or throttling
Use LLMs (Copilot, Claude) to generate troubleshooting scripts or analyze logs
Document recurring issues and solutions in your team knowledge base
Common Pitfalls
Ignoring events (often contain the root cause)
Restarting pods without root cause analysis
Not checking for node-level issues (disk, network, taints)
Manual changes outside Git in GitOps-managed clusters
References
Related Topics
Pod Troubleshooting Commands - Specific tools for debugging pods
Kubernetes Core Concepts - Understanding fundamentals helps troubleshooting
Logging - Collecting logs from Kubernetes
Metrics - Monitoring Kubernetes performance
Last updated