Real-World RAG Incidents: Lessons from a Production Rollout
A field report from rolling out retrieval-augmented generation in production, including cache bugs, bad embeddings, and how we fixed them.
Topics
Latest Articles
View All →How We Stopped Terraform Drift from Surprising On-Call
A real story of removing console-only changes, adding drift detection, and getting Terraform back in charge.
Systemd Tricks We Use to Keep Services Boring
Concrete systemd unit patterns that reduced flakiness: restart policies, resource limits, and structured logs.