Blog

Practical articles on AI, DevOps, Cloud, Linux, and infrastructure engineering.

Docker Multi-Stage Builds: Optimizing Image Size

A focused look at the techniques that shrink container images: which actually pay off, which are folklore, and the discipline that keeps images small over time.

Kiril Urbonas·10

Read article

••8 months ago

A Pragmatic Multi-Region Strategy for Small Teams

How a small team moved from single-region risk to a simple active/passive multi-region setup without doubling complexity.

Kiril Urbonas·6

Read article

••8 months ago

Kubernetes Backup Strategies: Protecting Your Cluster Data

We've had to restore a Kubernetes cluster from backup twice. Once it worked. Once it took 14 hours. Here's the strategy we run now.

Kiril Urbonas·6

Read article

••8 months ago

MLOps Pipelines: From Experiment to Production Models

Build MLOps pipelines for training, evaluation, and deployment. Reproducibility and monitoring.

Kiril Urbonas·1

Read article

••9 months ago

What We Learned Running Weekly Game Days on Our CI/CD Pipeline

Practical game day scenarios for CI/CD: broken rollbacks, permission issues, and slow feedback loops—and how we fixed them.

Kiril Urbonas·2

Read article

••9 months ago

Service Mesh Implementation: Istio vs Linkerd

We ran Istio for a year, then switched to Linkerd. Both can do the job. The decision came down to operational fit, not features.

Kiril Urbonas·11

Read article

••9 months ago

Real-World RAG Incidents: Lessons from a Production Rollout

A field report from rolling out retrieval-augmented generation in production, including cache bugs, bad embeddings, and how we fixed them.

Kiril Urbonas·3

Read article

••9 months ago

Architecture Review: Python Worker Queue Scaling Patterns

We started with a single Celery worker handling everything. Eight months and three architecture changes later, here's what scaled and what we learned about queue design.

Kiril Urbonas·4

Read article

••9 months ago

CI/CD Pipeline Optimization: Speeding Up Your Builds

We cut our average CI build time from 28 minutes to 6 minutes. The changes that mattered, ranked by impact.

Kiril Urbonas·8

Read article

••9 months ago

How We Stopped Terraform Drift from Surprising On-Call

A real story of removing console-only changes, adding drift detection, and getting Terraform back in charge.

Kiril Urbonas·6

Read article

••9 months ago

Systemd Tricks We Use to Keep Services Boring

Concrete systemd unit patterns that reduced flakiness: restart policies, resource limits, and structured logs.

Kiril Urbonas·4

Read article

••9 months ago

Container Security Scanning: Protecting Your Docker Images

We scan every container image in CI and at runtime. Trivy + Cosign + admission controllers. The setup that earns its place and what we wish we'd known.

Kiril Urbonas·11

Read article

Page 26 of 44 · 518 posts