Blog

Practical articles on AI, DevOps, Cloud, Linux, and infrastructure engineering.

Cloud Disaster Recovery Runbook Design: How Small Teams Rehearse Multi-Region Failover

A practical disaster recovery runbook guide for small cloud teams that need realistic failover steps, clear ownership, and repeatable rehearsals instead of shelfware documents.

Kiril Urbonas·11

Read article

••3 months ago

RAG Retrieval Quality Evaluation: The Checks We Added After Bad Answers Reached Production

A search-friendly guide to RAG retrieval quality evaluation, based on the moment one production assistant started citing stale documents and the team had to prove what 'good retrieval' meant.

Kiril Urbonas·7

Read article

••3 months ago

Infrastructure Documentation as Code: How One Platform Team Reduced Audit Fire Drills

This infrastructure documentation as code guide shows how a platform team moved runbooks, ownership maps, and architecture decisions into versioned workflows that people actually trusted.

Kiril Urbonas·8

Read article

••3 months ago

Linux Patch Management for Production Fleets: A Real-World Maintenance Workflow

A production-tested Linux patch management workflow for teams that need security fixes without turning every maintenance window into a gamble.

Kiril Urbonas·7

Read article

••3 months ago

AWS Cost Allocation Tags for Shared Platforms: What Finally Worked

A hands-on guide to AWS cost allocation tags for shared environments, built from a real platform-team problem: everyone used the cluster, but nobody trusted the bill.

Kiril Urbonas·9

Read article

••3 months ago

GitHub Actions Monorepo CI: How We Cut Build Times Without Breaking Main

A practical GitHub Actions monorepo CI guide built around a real scaling problem: long queues, noisy failures, and developers waiting 40 minutes for feedback.

Kiril Urbonas·16

Read article