Practical articles on AI, DevOps, Cloud, Linux, and infrastructure engineering.
Multi-agent systems are mostly hype. The patterns we've seen actually deliver value, plus the ones we'd avoid until the tooling is more mature.
A field report from rolling out retrieval-augmented generation in production, including cache bugs, bad embeddings, and how we fixed them.
We tried four quantization techniques on Llama-3 and Mistral models. The quality vs cost trade-offs we found, plus what works for production inference.
We benchmarked four vector databases on the same workload. Each has a place. Here's how we'd pick today.
We've shipped four production RAG applications. Each one taught us something. The end-to-end pattern that works.
Run retrieval-augmented generation at scale. Chunking, caching, and observability.