On-Call Without Burnout: Rotations, Runbooks, and Escalation

Our best engineer quit citing on-call. We rebuilt the whole thing: saner rotations, runbooks that actually help at 3am, and escalation that doesn't punish asking for help.

DevOps SRE On Call Reliability

Kiril UrbonasDevOps Engineer

|Jul 7, 2026

On-Call Without Burnout: Rotations, Runbooks, and Escalation

Topics

Terraform188 Monitoring179 AWS153 Kubernetes110 LLM102 CI/CD93 Linux83 Python81 GPT73 Security69

Latest Articles

View All →

••last week

Argo CD ApplicationSets: Managing Many Clusters Without Copy-Paste

Twenty-three clusters, one app, and a folder of near-identical Application YAMLs that drifted constantly. ApplicationSets killed the copy-paste and the drift.

Kiril Urbonas·4 min read

Read article

••last week

Cloud IAM Least-Privilege Without Breaking Everything

Least privilege fails when it's a one-time audit that locks things down until something breaks, then gets reverted. The iterative, log-driven approach that tightens permissions safely — and the policies we stopped writing by hand.

Kiril Urbonas·4 min read·3

Read article

Page 5 of 44 · 517 posts

1...4 5 6...44