CI/CD, automation, reliability, and release engineering.
A working Prometheus stack for a 40-node cluster: what we deploy, what we tune, and what we wish we'd known about cardinality two years ago.
A focused look at the techniques that shrink container images: which actually pay off, which are folklore, and the discipline that keeps images small over time.
We've had to restore a Kubernetes cluster from backup twice. Once it worked. Once it took 14 hours. Here's the strategy we run now.
Practical game day scenarios for CI/CD: broken rollbacks, permission issues, and slow feedback loops—and how we fixed them.
We ran Istio for a year, then switched to Linkerd. Both can do the job. The decision came down to operational fit, not features.
We cut our average CI build time from 28 minutes to 6 minutes. The changes that mattered, ranked by impact.
We scan every container image in CI and at runtime. Trivy + Cosign + admission controllers. The setup that earns its place and what we wish we'd known.
We migrated 40+ services to GitOps with Argo CD. Two years in, here's what works and what required workarounds.
How a packet actually gets from the internet to a pod, walked layer by layer. Plus the things that surprise people the first time they hit them.