Blog

Practical articles on AI, DevOps, Cloud, Linux, and infrastructure engineering.

••March 8, 2025

Real-World RAG Incidents: Lessons from a Production Rollout

A field report from rolling out retrieval-augmented generation in production, including cache bugs, bad embeddings, and how we fixed them.

Kiril Urbonas·3

Read article

••March 1, 2025

Real-World RAG Incidents: Lessons from a Production Rollout

A field report from rolling out retrieval-augmented generation in production, including cache bugs, bad embeddings, and how we fixed them.

Kiril Urbonas·5

Read article

••December 10, 2024

Field Notes: RAG Retrieval Quality Evaluation

I spent 3 weeks chasing an answer-quality regression that turned out to be a tokenizer mismatch in a library upgrade. Here's what I learned about evaluating RAG.

Kiril Urbonas·2

Read article

••December 6, 2024

Field Notes: Prompt Versioning and Regression Testing

We changed a system prompt for what we thought was a tone improvement and broke a customer-critical extraction overnight. The version control and regression tests we built next.

Kiril Urbonas·7

Read article

••July 11, 2024

Deep Dive: SLO-Based Monitoring for APIs

We replaced 47 percentile threshold alerts with 3 SLO burn-rate alerts. The on-call rotation gets paged less and catches more.

Kiril Urbonas·5

Read article

••June 2, 2024

Deep Dive: Model Serving Observability Stack

We had Datadog for app metrics, Loki for logs, and zero useful insight into what our LLM service was actually doing. Here's the observability stack we built specifically for model serving.

Kiril Urbonas·11

Read article

••February 12, 2024

Fine-tuning Large Language Models: A Practical Guide

Learn how to fine-tune LLMs like Llama 2, Mistral, and GPT models for your specific use case. Includes LoRA, QLoRA, and full fine-tuning techniques.

Kiril Urbonas·14

Read article

••February 3, 2024

Building Production-Ready AI Applications with LangChain and Docker

We deploy LangChain apps in Docker on Kubernetes. The patterns that work, the LangChain-specific gotchas, and what we'd build differently next time.

Kiril Urbonas·16

Read article

••January 1, 2024

Fine-tuning Llama 3 on Consumer Hardware

I fine-tuned Llama 3 8B on a single 4090 over a weekend for a side project. Here's what worked, what cost more than expected, and what I'd do differently.

Kiril Urbonas·5

Read article

Page 7 of 7 · 81 posts