🚨
Incident Management & On-Call Engineering
Respond, resolve, and learn — build the systems and culture that keep production reliable.
Intermediate0.8 hours8 lessons
Start Course →What You'll Learn
- ✓Define severity levels and incident classification criteria
- ✓Design fair, sustainable on-call rotations with escalation policies
- ✓Lead an incident response with a clear communication structure
- ✓Write effective runbooks and playbooks that accelerate mitigation
- ✓Facilitate blameless postmortems that generate lasting improvements
- ✓Measure and act on error budgets within an SRE framework
- ✓Configure PagerDuty and Opsgenie for real-world alerting
- ✓Build an incident management culture — not just processes
Prerequisites
- •Experience deploying and operating cloud services
- •Familiarity with observability concepts (metrics, logs, traces)
- •Basic understanding of SLIs, SLOs, and SLAs is helpful
Course Curriculum
Module 1: Foundations
Module 2: Response
Practice for the Real Exam
After completing this course, test yourself with exam-style practice questions.