Blog

Practical articles on AI, DevOps, Cloud, Linux, and infrastructure engineering.

DevOps Metrics and KPIs: Measuring Success

We track the four DORA metrics plus a handful of others. The trade-off between what's measurable and what's meaningful, and how we use the numbers.

Kiril Urbonas·1

Read article

••8 months ago

Multi-Region Resilience: Failover, Data, and DNS

Design for region failure. Active/passive and active/active, data replication, and failover testing.

Kiril Urbonas·11

Read article

••8 months ago

Canary Releases: Gradual Rollout Strategy

We've run canary deploys on most services for two years. The mechanics are easy; the metrics that decide "promote or roll back" are where the design is.

Kiril Urbonas·14

Read article

••8 months ago

How We Stopped Terraform Drift from Surprising On-Call

A real story of removing console-only changes, adding drift detection, and getting Terraform back in charge.

Kiril Urbonas·4

Read article

••8 months ago

Blue-Green Deployments: Zero-Downtime Releases

We use blue-green for stateful services where canary doesn't fit. The actual mechanics, the data-layer subtleties, and when blue-green isn't the right answer.

Kiril Urbonas·7

Read article

••8 months ago

A Pragmatic Multi-Region Strategy for Small Teams

How a small team moved from single-region risk to a simple active/passive multi-region setup without doubling complexity.

Kiril Urbonas·4

Read article

••8 months ago

Log Aggregation Strategies: Centralizing Your Logs

We collect ~800GB of logs per day across our fleet. The shape of our logging stack, what we keep, what we drop, and what we'd build differently.

Kiril Urbonas·8

Read article

••8 months ago

Infrastructure Monitoring with Prometheus: Complete Setup Guide

A working Prometheus stack for a 40-node cluster: what we deploy, what we tune, and what we wish we'd known about cardinality two years ago.

Kiril Urbonas·12

Read article

••8 months ago

How We Stopped Terraform Drift from Surprising On-Call

A real story of removing console-only changes, adding drift detection, and getting Terraform back in charge.

Kiril Urbonas·4

Read article

••8 months ago

Docker Multi-Stage Builds: Optimizing Image Size

A focused look at the techniques that shrink container images: which actually pay off, which are folklore, and the discipline that keeps images small over time.

Kiril Urbonas·10

Read article

••8 months ago

A Pragmatic Multi-Region Strategy for Small Teams

How a small team moved from single-region risk to a simple active/passive multi-region setup without doubling complexity.

Kiril Urbonas·6

Read article

••8 months ago

Kubernetes Backup Strategies: Protecting Your Cluster Data

We've had to restore a Kubernetes cluster from backup twice. Once it worked. Once it took 14 hours. Here's the strategy we run now.

Kiril Urbonas·6

Read article

Page 7 of 15 · 179 posts