Practical articles on AI, DevOps, Cloud, Linux, and infrastructure engineering.
Concrete systemd unit patterns that reduced flakiness: restart policies, resource limits, and structured logs.
A different angle on DR: the planning process — RTO/RPO conversations, dependency mapping, and what we learned about prioritizing what to recover.
We cut our largest playbook's runtime from 14 minutes to 4 minutes. The specific changes that mattered, plus the ones that didn't.
We tried Pulumi for a quarter and went back to Terraform. Both are real options. Why we picked one and what would change our mind.
We have a private module registry with ~25 modules used across 12 accounts. Versioning, interface design, and the over-modularization mistake we keep making.
A condensed checklist of the systemd unit-file patterns we now use everywhere, with the production reasons each one matters.
Run services reliably with systemd: units, dependencies, and resource limits.