Practical articles on AI, DevOps, Cloud, Linux, and infrastructure engineering.
One Terraform state file per environment sounds obvious until you watch a dev plan touch a prod resource. Here's how we actually isolate state and the mistakes we made getting there.
Our base image went from 1.2 GB and 200+ CVEs to 80 MB and 4 CVEs. Most of the work wasn't clever — it was deletion.
We mapped every byte that ends up in our production containers. The map showed three places trust was implicit. Each became a control.
Platform teams own the systems that EVERY service depends on. Our incident response playbook for when the foundation cracks.
When everything seems "slow," a baseline gives you something to measure against. The capture-and-compare workflow we use on every Linux host.
We replaced three kernel-level monitoring tools with a small set of eBPF programs. What it bought us, what it cost, and where we still use the old stuff.
We removed the corporate VPN, set up workload identity everywhere, and made every service prove who it is on every call. The actual implementation, with what worked and what we abandoned.
Bash patterns beyond the basics: arrays, traps, process substitution, parameter expansion. The features that earn their place when scripts grow.
We cut our average production image size by 78% with multi-stage builds. The patterns that worked, the ones that didn't, and the production gotchas.