Practical articles on AI, DevOps, Cloud, Linux, and infrastructure engineering.
A field report from rolling out retrieval-augmented generation in production, including cache bugs, bad embeddings, and how we fixed them.
Build MLOps pipelines for training, evaluation, and deployment. Reproducibility and monitoring.
We started with a single Celery worker handling everything. Eight months and three architecture changes later, here's what scaled and what we learned about queue design.
We've shipped three end-to-end ML systems. The pieces that look obvious in slides and turn out to be the actual work.
We started routing 90% of LLM traffic through a small internal gateway. The gateway wasn't planned — it emerged from solving the same problem in 5 places. Here's the shape it took.