Troubleshooting Guide

Shell-Based Diagnostics

Process Investigation

# Advanced process tree analysis with resource usage
ps auxf | awk '{if($3>0.0 || $4>0.0) print $0}'

# Find processes causing high I/O
iotop -o -b -n 2

# Trace system calls with stack traces
perf trace -p $(pgrep process_name) -s

# Advanced strace filtering
strace -e trace=network,ipc -f -p $(pgrep process_name)

System Performance

# Quick performance profile using perf
perf record -F 99 -a -g -- sleep 30
perf report --stdio

# Memory leak investigation
valgrind --leak-check=full --show-leak-kinds=all ./program

# System-wide performance snapshot
sudo sysdig -c spectrogram 'evt.type=switch and evt.dir=>>'

Network Diagnostics

Advanced Network Troubleshooting

Container Networking

Cloud Platform Issues

AWS Troubleshooting

Azure Diagnostics

Container Orchestration

Kubernetes Deep Dive

Performance Analysis

Resource Profiling

AI/LLM Integration

Using AI for Troubleshooting

GitHub Copilot CLI

OpenAI API Integration

Using Ollama Locally

Quick Reference: One-Liners

System Analysis

Network Debugging

Container Management

Best Practices

  1. Always maintain a local troubleshooting toolkit:

  1. Set up persistent debug aliases:

  1. Use structured logging:

Automation Tips

Create Self-Documenting Scripts

Monitoring Setup

Remember to regularly update this guide as new tools and techniques emerge.

Last updated