Blog

Practical articles on AI, DevOps, Cloud, Linux, and infrastructure engineering.

Agentic Ops — When (and When Not) to Use AI Agents for Incident Response

AI agents for incident triage sound great in demos. We've tried it in production. The patterns that earn their keep, the ones that backfire, and where humans still beat agents.

Kiril Urbonas

Read article

••4 days ago

Burn-Rate Alerting — The SLO Discipline That Prevents Alert Fatigue

Static thresholds on error rate produce noisy alerts. Burn-rate alerting flips the question to "are we burning the error budget faster than we can sustain?" — and pages only on real problems.

Kiril Urbonas

Read article

••last week

SLI Design — Picking Metrics That Actually Correlate With User Experience

Wrong SLI metrics mean green dashboards while users churn. The discipline of picking signals that move with what users actually feel, and the ones that look reliable but lie.

Kiril Urbonas

Read article

••last month

Chaos Engineering — What We Actually Run as Game Days

We run a chaos game day each quarter. The scenarios that surfaced real problems, the ones that didn't, and the operational discipline that makes the practice pay back.

Kiril Urbonas

Read article

••2 months ago

Cloud Disaster Recovery Runbook Design: How Small Teams Rehearse Multi-Region Failover

A practical disaster recovery runbook guide for small cloud teams that need realistic failover steps, clear ownership, and repeatable rehearsals instead of shelfware documents.

Kiril Urbonas

Read article

••3 months ago

End-of-Week Engineering: Why Smart Tech Teams Don’t Ship Major Changes on Friday

A practical risk-management framework for release timing, Friday deployment policies, progressive delivery, and how elite teams protect reliability and people.

Kiril Urbonas

Read article

••3 months ago

SRE Error Budgets in Practice: Shipping Fast Without Burning Reliability

A practical way to define SLOs and error budgets, connect them to release decisions, and avoid reliability debates without data.

Kiril Urbonas

Read article

7 posts