SRE Fundamentals
Core Principles of Modern SRE
Modern SRE Practices (2025)
Error Budgets and SLOs
# Example SLO definition in Prometheus format
groups:
- name: availability.rules
rules:
- record: availability:success_rate_1d
expr: sum(rate(http_requests_total{status=~"2.."}[1d])) / sum(rate(http_requests_total[1d]))
- alert: AvailabilitySLOBudgetBurning
expr: availability:success_rate_1d < 0.995
for: 1h
labels:
severity: warning
annotations:
description: "Service is burning through error budget fast"SRE's Four Golden Signals
Disaster Recovery and Incident Management
What does a modern Site Reliability Engineer do?
Engineering Focus (Min. 50% of time)
Operations Focus (Max. 50% of time)
DevOps vs. SRE: Beyond Terminology
DevOps Engineers
SRE Engineers
Practical Example: Error Budget Implementation
SRE Implementation in Cloud-Native Environments
Conclusion
Last updated