Practical articles on AI, DevOps, Cloud, Linux, and infrastructure engineering.
We benchmarked four vector databases on the same workload. Each has a place. Here's how we'd pick today.
We've shipped four production RAG applications. Each one taught us something. The end-to-end pattern that works.
We cut LLM inference cost 47% over a quarter while improving p95 latency. Six changes, ranked by what each one actually delivered.
Wikis rot. We moved every operational doc into the repo it describes. Six months in, the docs are mostly correct because the only people who can update them are the ones who change the system.
A flat VPC is fine until you need to prove who can reach what. Five segmentation patterns that work in AWS without requiring a service mesh.
Blue/green sounds simple until your green cluster has a memory leak and you've already sent 50% of traffic there. The guardrails are what make it safe.
We had four different patch cadences across our fleet and routinely missed CVEs by weeks. The unified workflow that finally caught up.
AWS bill grew 40% YoY for two years before we got serious. Tagging, scoped budgets, and a weekly review meeting did 80% of the work.
A team of 30 engineers all editing the same monolithic Ansible repo doesn't work. Here's the role taxonomy and review process that did.
One Terraform state file per environment sounds obvious until you watch a dev plan touch a prod resource. Here's how we actually isolate state and the mistakes we made getting there.
Our CI was 73% green at the worst point. People trusted it less than coin flips. Six things we did to get to 96%, in rough order of impact.
Our base image went from 1.2 GB and 200+ CVEs to 80 MB and 4 CVEs. Most of the work wasn't clever — it was deletion.