Blog

Practical articles on AI, DevOps, Cloud, Linux, and infrastructure engineering.

Real-World RAG Incidents: Lessons from a Production Rollout

A field report from rolling out retrieval-augmented generation in production, including cache bugs, bad embeddings, and how we fixed them.

Kiril Urbonas·4

Read article

••7 months ago

Real-World RAG Incidents: Lessons from a Production Rollout

A field report from rolling out retrieval-augmented generation in production, including cache bugs, bad embeddings, and how we fixed them.

Kiril Urbonas·2

Read article

••7 months ago

Real-World RAG Incidents: Lessons from a Production Rollout

A field report from rolling out retrieval-augmented generation in production, including cache bugs, bad embeddings, and how we fixed them.

Kiril Urbonas·5

Read article

••8 months ago

Real-World RAG Incidents: Lessons from a Production Rollout

A field report from rolling out retrieval-augmented generation in production, including cache bugs, bad embeddings, and how we fixed them.

Kiril Urbonas·4

Read article

••8 months ago

Real-World RAG Incidents: Lessons from a Production Rollout

A field report from rolling out retrieval-augmented generation in production, including cache bugs, bad embeddings, and how we fixed them.

Kiril Urbonas·2

Read article

••8 months ago

Real-World RAG Incidents: Lessons from a Production Rollout

A field report from rolling out retrieval-augmented generation in production, including cache bugs, bad embeddings, and how we fixed them.

Kiril Urbonas·4

Read article

••8 months ago

MLOps Pipelines: From Experiment to Production Models

Build MLOps pipelines for training, evaluation, and deployment. Reproducibility and monitoring.

Kiril Urbonas·1

Read article

••9 months ago

Real-World RAG Incidents: Lessons from a Production Rollout

A field report from rolling out retrieval-augmented generation in production, including cache bugs, bad embeddings, and how we fixed them.

Kiril Urbonas·3

Read article

••9 months ago

Architecture Review: Python Worker Queue Scaling Patterns

We started with a single Celery worker handling everything. Eight months and three architecture changes later, here's what scaled and what we learned about queue design.

Kiril Urbonas·4

Read article

••9 months ago

Real-World RAG Incidents: Lessons from a Production Rollout

A field report from rolling out retrieval-augmented generation in production, including cache bugs, bad embeddings, and how we fixed them.

Kiril Urbonas·4

Read article

••9 months ago

Production AI Pipelines: Building End-to-End ML Systems

We've shipped three end-to-end ML systems. The pieces that look obvious in slides and turn out to be the actual work.

Kiril Urbonas·5

Read article

••9 months ago

Architecture Review: LLM Gateway Design for Multi-Provider Inference

We started routing 90% of LLM traffic through a small internal gateway. The gateway wasn't planned — it emerged from solving the same problem in 5 places. Here's the shape it took.

Kiril Urbonas·10

Read article

Page 3 of 7 · 81 posts