CI/CD, automation, reliability, and release engineering.
Practical game day scenarios for CI/CD: broken rollbacks, permission issues, and slow feedback loops—and how we fixed them.
Shift-left security with image scanning. Trivy, policy gates, and runtime integration.
We track the four DORA metrics plus a handful of others. The trade-off between what's measurable and what's meaningful, and how we use the numbers.
We've run canary deploys on most services for two years. The mechanics are easy; the metrics that decide "promote or roll back" are where the design is.
We use blue-green for stateful services where canary doesn't fit. The actual mechanics, the data-layer subtleties, and when blue-green isn't the right answer.
We collect ~800GB of logs per day across our fleet. The shape of our logging stack, what we keep, what we drop, and what we'd build differently.