Practical articles on AI, DevOps, Cloud, Linux, and infrastructure engineering.
A practical guide to writing and managing systemd services for production. The unit file features that earn their place, plus the operational workflows.
We use CloudFront + Lambda@Edge for specific patterns. The wins, the production gotchas, and where we hit Lambda@Edge's limits.
Postgres, DynamoDB, Redis, Elasticsearch, Snowflake. We use all five for different workloads. The decision criteria, not the marketing comparison.
We've executed real disaster recoveries twice. The plan that survived contact with reality, and what was wrong about the plans we had before that.
VPCs, subnets, route tables, gateways. The mental model that finally made cloud networking click after I stopped trying to map it 1:1 to physical networks.
Shift-left security with image scanning. Trivy, policy gates, and runtime integration.
A working AWS security baseline, derived from the actual incidents we've had and the audit findings we've cleared.
We use serverless for specific patterns, not as a default. The patterns where it shines, the ones it doesn't, and the gotchas at production scale.
We run our app in two AWS regions for failover. The hard parts aren't the deployment — they're data consistency, traffic shifting, and the assumptions that break when "primary" is suddenly the wrong region.
We run ~200 Lambda functions. Cold starts, memory tuning, and the cost-vs-latency trade-offs that actually move the bill.
We track the four DORA metrics plus a handful of others. The trade-off between what's measurable and what's meaningful, and how we use the numbers.
We've run canary deploys on most services for two years. The mechanics are easy; the metrics that decide "promote or roll back" are where the design is.