A real story of removing console-only changes, adding drift detection, and getting Terraform back in charge.
Our worst incident of last year started with a simple question: “Why is there an EC2 instance we can't find in Terraform?”
```bash terraform plan -detailed-exitcode || echo "Drift detected" ```
Drift still happens, but on-call no longer learns about it at the worst possible moment.
Get the latest tutorials, guides, and insights on AI, DevOps, Cloud, and Infrastructure delivered directly to your inbox.
Concrete systemd unit patterns that reduced flakiness: restart policies, resource limits, and structured logs.
We run mostly on AWS but use GCP for specific workloads. The honest cost-benefit analysis of multi-cloud, plus the patterns that make it not awful.
Explore more articles in this category
The single most useful Postgres extension you might not be using. The queries it surfaces, the indexes it implies, and the operational discipline of reading it weekly.
Version-pinned modules across many repos. The release process, semver discipline, and the breaking-change communication that keeps a shared registry sane.
EXPLAIN ANALYZE output is dense and intimidating. Once you can read it, most slow-query investigations finish in minutes. The patterns we keep seeing.