Practical articles on AI, DevOps, Cloud, Linux, and infrastructure engineering.
A real story of removing console-only changes, adding drift detection, and getting Terraform back in charge.
How a small team moved from single-region risk to a simple active/passive multi-region setup without doubling complexity.
AWS bill grew 40% YoY for two years before we got serious. Tagging, scoped budgets, and a weekly review meeting did 80% of the work.
Unify traces, metrics, and logs with OpenTelemetry. Instrumentation, sampling, and backend-agnostic pipelines.
Our CI was 73% green at the worst point. People trusted it less than coin flips. Six things we did to get to 96%, in rough order of impact.
We upgraded a 60-node EKS cluster from 1.27 to 1.31 over six months. Four minor versions, one bad surprise, zero customer impact. Here's the playbook.