Practical articles on AI, DevOps, Cloud, Linux, and infrastructure engineering.
A field report from rolling out retrieval-augmented generation in production, including cache bugs, bad embeddings, and how we fixed them.
We benchmarked four vector databases on the same workload. Each has a place. Here's how we'd pick today.
We've shipped four production RAG applications. Each one taught us something. The end-to-end pattern that works.
Run retrieval-augmented generation at scale. Chunking, caching, and observability.
We cut LLM inference cost 47% over a quarter while improving p95 latency. Six changes, ranked by what each one actually delivered.