Blog

Build Your First RAG App in 100 Lines of Python

A working retrieval-augmented generation app you can run today. Markdown ingestion, embeddings, semantic search, and an LLM answer — start to finish in one afternoon.

Kiril Urbonas·24

Fine-Tuning vs RAG vs Long-Context: A Decision Framework With Numbers

We've shipped all three patterns to production. They're not interchangeable. Here's the framework we now use to decide which approach fits a given task.

Kiril Urbonas·6

LLM Output Validation: Schema-First Prompt Engineering Patterns

We invalidate ~6% of LLM outputs before they reach a downstream system. Here's how we structure prompts and validators to catch malformed responses early.

Kiril Urbonas·13

Vector Database Selection: Pinecone, pgvector, Qdrant After 6 Months in Production

We ran the same RAG workload across three vector stores for a quarter each. Here's what we learned about latency, cost, and operational overhead.

Kiril Urbonas·10

Self-Hosted LLMs vs OpenAI API: A Cost-vs-Latency Analysis After 6 Months

We ran the same workload on both for half a year. The break-even point isn't where most blog posts say it is — and the latency story has more nuance than throughput-per-dollar charts admit.

Kiril Urbonas·19

Embedding Quality in RAG: How We Cut Hallucinations by 60%

Six months running RAG in production taught us that the retrieval step matters far more than the model. Concrete techniques that moved the needle, with before/after numbers.

Kiril Urbonas·9

Prompt Engineering Patterns That Actually Work in Production

Battle-tested prompt patterns from running LLM features in production: structured output, chain-of-thought, and graceful failure handling.

Kiril Urbonas·9

Embedding Model Upgrades Without Search Chaos: A Safer RAG Rollout Pattern

A practical embedding model upgrade guide for RAG systems, built from a real support-search migration that initially reduced answer quality instead of improving it.

Kiril Urbonas·50

Prompt Versioning and Regression Testing: How Teams Avoid Silent AI Regressions

A real-world guide to prompt versioning and regression testing for production AI features, focused on preventing the subtle changes that hurt quality long before anyone notices.

Kiril Urbonas·11

RAG Retrieval Quality Evaluation: The Checks We Added After Bad Answers Reached Production

A search-friendly guide to RAG retrieval quality evaluation, based on the moment one production assistant started citing stale documents and the team had to prove what 'good retrieval' meant.

Kiril Urbonas·7

••4 months ago

Real-World RAG Incidents: Lessons from a Production Rollout

A field report from rolling out retrieval-augmented generation in production, including cache bugs, bad embeddings, and how we fixed them.

Kiril Urbonas·7