Practical articles on AI, DevOps, Cloud, Linux, and infrastructure engineering.
Provision real cloud resources with Terraform — a VPC, an S3 bucket, and an EC2 instance — using the standard init/plan/apply workflow.
We ran Pulumi in TypeScript and Terraform in HCL side by side across 60+ services. Each won different categories of work. Here's the breakdown.
We deleted every static GCP service account key in our org over six weeks. Here's the migration plan, the gotchas, and the policies we now enforce.
We moved a 60-node production EKS cluster to Auto Mode. Some pain points evaporated, others got harder. The cost picture is more nuanced than the marketing suggests.
We replaced 14 long-lived IAM users with SSO + temporary credentials. The migration plan, the gotchas, and the policies we now enforce.
How we shipped three schema migrations with zero customer impact. Expand-then-contract, dual-writes, and the rollback plan we never had to use — but tested anyway.
We were drowning in 200 alerts a week. Most got ignored. After a quarter of triage and rework, we're at about 15 — and on-call actually responds to them.
We had .env files in three repos, AWS keys in Slack DMs, and a postgres password etched into a Confluence page. Cleaning it up took a sprint and changed how we think about secrets.
Practical patterns for Terraform modules at scale: versioning, composition, testing, and avoiding the monolith trap.
A real cost audit uncovered idle load balancers, oversized RDS instances, and forgotten snapshots. Here's what we found and how we fixed each one.
A real-world Terraform module version pinning guide for platform teams that want safer upgrades, clearer ownership, and fewer broken pipelines after shared module releases.
A practical Terraform state isolation guide built from a real environment-mixing incident, with patterns for safer backends, clearer ownership, and lower blast radius.