Pod Troubleshooting Commands
This guide provides actionable commands and best practices for troubleshooting pods in Kubernetes clusters (AKS, EKS, GKE, and on-prem). Use these steps for real-life incident response and GitOps workflows.
Common Troubleshooting Commands
List all Pods in all Namespaces:
Check Resource Consumption:
Describe a Pod:
View Pod Logs:
Follow Pod Logs (stream in real-time):
Exec into a Pod:
Get Events for a Pod:
Check Pod Health (Readiness/Liveness):
Retrieve Pod IP and Node:
Restart a Pod:
Check Pod Status:
List Pod Events (sorted):
Verify Pod Affinity/Anti-Affinity:
Check Resource Requests and Limits:
Identify Stuck Pods:
Real-Life Troubleshooting Workflow
Identify the failing pod:
Check pod status and events:
Inspect logs:
Check resource usage:
Exec into the pod for deeper inspection:
Review affinity, resource limits, and node assignment:
If using GitOps: Check if the manifest in Git matches the running pod. If not, investigate drift or failed syncs (ArgoCD/Flux dashboards).
Best Practices (2025)
Always check pod events and logs before restarting or deleting pods
Use
kubectl get events
sorted by timestamp for recent issuesValidate resource requests/limits to avoid OOMKilled or throttling
Use LLMs (Copilot, Claude) to generate troubleshooting scripts or analyze logs
Document recurring issues and solutions in your team knowledge base
Common Pitfalls
Ignoring events (often contain the root cause)
Restarting pods without root cause analysis
Not checking for node-level issues (disk, network, taints)
Manual changes outside Git in GitOps-managed clusters
References
Related Topics
Last updated