Blog

Argo CD ApplicationSets: Managing Many Clusters Without Copy-Paste

Twenty-three clusters, one app, and a folder of near-identical Application YAMLs that drifted constantly. ApplicationSets killed the copy-paste and the drift.

Kiril Urbonas

Cloud IAM Least-Privilege Without Breaking Everything

Least privilege fails when it's a one-time audit that locks things down until something breaks, then gets reverted. The iterative, log-driven approach that tightens permissions safely — and the policies we stopped writing by hand.

Prompt Caching for Production LLM Apps — Cutting Cost and Latency at the Token Layer

A long, stable system prompt re-billed on every request is money on fire. How prompt caching works, where the cache boundary belongs, and the structuring discipline that got us a big cost and latency cut without changing behavior.

Kiril Urbonas·5

Linux Memory Pressure — Reading PSI Before the OOM Killer Reads You

Free memory is a lie and load average doesn't see memory stalls. How Pressure Stall Information gives you a direct, early signal of memory contention — and how we wired it into alerts and autoscaling.

Kubernetes Pod Disruption Budgets — Surviving Node Drains Without an Outage

Node upgrades, autoscaler scale-downs, and spot reclaims all drain nodes. Without PDBs they can take all your replicas at once. The budgets, probes, and graceful-shutdown handling that keep voluntary disruptions invisible to users.

Kiril Urbonas·4

Terraform Drift Detection in CI — Catching Out-of-Band Changes Before They Bite

State drift is silent until a deploy fails or an outage reveals it. The scheduled plan-and-diff pipeline that surfaces console hotfixes and manual edits while they're still cheap to reconcile.

Kiril Urbonas·5

RAG Retrieval Evaluation — Building an Offline Eval Harness Before You Ship

You can't improve retrieval you don't measure. The offline eval harness that lets us change embeddings, chunking, and rerankers with confidence instead of vibes — with the metrics that actually predict production quality.

Kiril Urbonas·5

Alert on Symptoms, Not Causes — SLO Burn-Rate Alerting in Practice

Cause-based alerts page you for things that don't matter and miss things that do. How we rebuilt alerting around SLO burn rates — multi-window, multi-burn-rate — and cut pages while catching more real pain.

Edge Caching with Stale-While-Revalidate — Fast and Fresh at the CDN

The cache-control header most teams under-use. How stale-while-revalidate and stale-if-error turned our CDN from a freshness liability into a latency and resilience win — with the gotchas.

LLM Output Validation — Schema-Constrained Generation in Production

Parsing model output with a regex and a prayer doesn't survive contact with traffic. The validation layers that keep structured LLM output reliable — constrained decoding, schema validation, and the repair loop.

CI Pipeline Caching That Actually Pays Off

Most CI caches either miss constantly or restore stale junk. The cache-key discipline, scope boundaries, and measurements that turned our pipeline cache from theatre into real minutes saved.