Practical articles on AI, DevOps, Cloud, Linux, and infrastructure engineering.
A real story of removing console-only changes, adding drift detection, and getting Terraform back in charge.
We've executed real disaster recoveries twice. The plan that survived contact with reality, and what was wrong about the plans we had before that.
How a small team moved from single-region risk to a simple active/passive multi-region setup without doubling complexity.
VPCs, subnets, route tables, gateways. The mental model that finally made cloud networking click after I stopped trying to map it 1:1 to physical networks.
We run both ECS and EKS in production. Which we use for what, and the actual decision criteria — not the marketing comparison.
A working AWS security baseline, derived from the actual incidents we've had and the audit findings we've cleared.
We use serverless for specific patterns, not as a default. The patterns where it shines, the ones it doesn't, and the gotchas at production scale.
Building visibility into cloud costs that actually drives action. The dashboards we look at, the alerts that fire, and the queries we run.