Practical articles on AI, DevOps, Cloud, Linux, and infrastructure engineering.
We cut our largest playbook's runtime from 14 minutes to 4 minutes. The specific changes that mattered, plus the ones that didn't.
We tried Pulumi for a quarter and went back to Terraform. Both are real options. Why we picked one and what would change our mind.
K8s Secrets are barely encrypted. We moved every secret to Vault with the Vault Agent injector and never went back. The setup checklist.
We test infrastructure code with three layers: validation, plan review, and integration tests. The setup that catches real bugs without slowing down PRs.
We have a private module registry with ~25 modules used across 12 accounts. Versioning, interface design, and the over-modularization mistake we keep making.
A container is a process with extra kernel features applied. Walking through namespaces, cgroups, and the actual mechanics — the level of detail that makes "container weirdness" debuggable.
We have a few hundred shell scripts in production. The patterns that make them survive contact with reality, and the ones we've stopped writing.
Filesystem choice, mount options, IO schedulers — the per-host tweaks that actually moved disk performance for our database and storage workloads.
How processes actually live and die on Linux, the tools that show what's happening, and the patterns we use for monitoring service health.
A practical Linux hardening checklist for production hosts. The settings that earn their place via real production reasons, not the cargo-cult version.
A condensed checklist of the systemd unit-file patterns we now use everywhere, with the production reasons each one matters.
A systematic approach to debugging Linux network issues. The tools that earn their place and the order I use them in.