Blog

Practical articles on AI, DevOps, Cloud, Linux, and infrastructure engineering.

Category: infrastructureClear filters

Backstage Software Catalog: Getting Adoption Past the Demo

The Backstage demo always wows leadership. Then six months later the catalog has 400 stale entries and nobody trusts it. Here's what got ours to actually stick.

Kiril Urbonas·1

Read article

••5 days ago

Terraform Import at Scale: Bringing Legacy Infra Under Code

We inherited 200-odd AWS resources built by hand over four years, with no state file anywhere. Here's how import blocks and a generation workflow got them under Terraform without a rebuild.

Kiril Urbonas·1

Read article

••6 days ago

Zero-Downtime Postgres Migrations: Expand-Contract in Practice

A single ALTER TABLE took a lock and stalled every write for 40 seconds during peak traffic. Expand-contract is how we stopped shipping outages.

Kiril Urbonas·1

Read article

••6 days ago

Postgres Read Replicas: Routing Reads Without Stale-Data Bugs

Adding a read replica cut primary load 60%, then support tickets rolled in about users not seeing their own edits. Replication lag turned into a correctness bug we had to route around.

Kiril Urbonas·1

Read article

••last week

Hunting Slow Queries with pg_stat_statements

The dashboard said the database was fine. It wasn't. Here's how pg_stat_statements found the query eating 40% of our Postgres CPU.

Kiril Urbonas·1

Read article

••last week

Time-Series Postgres: Declarative Partitioning in Practice

A 900GB events table where every query scanned four years to read one day. Range partitioning by month cut our dashboard queries from 8s to under 200ms.

Kiril Urbonas

Read article

••last week

Postgres Index Bloat: Detecting and Fixing It Before It Hurts

A 40GB index on a 6GB table was the first sign. Queries were fine until they weren't. Here's how we found the bloat and cleared it with zero downtime.

Kiril Urbonas

Read article

••2 weeks ago

Terraform Drift Detection in CI — Catching Out-of-Band Changes Before They Bite

State drift is silent until a deploy fails or an outage reveals it. The scheduled plan-and-diff pipeline that surfaces console hotfixes and manual edits while they're still cheap to reconcile.

Kiril Urbonas·5

Read article

••2 weeks ago

Observability — Correlating Logs, Metrics, and Traces in Anger

The "three pillars" framing misses the point — what matters is correlating across them. The patterns that earn their place and the tooling decisions that pay back.

Kiril Urbonas·5

Read article

••3 weeks ago

Database Sharding — The Choices We Wish We'd Made Earlier

Sharding isn't just "split the table" — the shard key choice cascades through queries, joins, rebalancing, and operations. The decisions that pay off and the ones we redid.

Kiril Urbonas·3

Read article

••0 months ago

Postgres Logical Replication for Zero-Downtime Major Upgrades

pg_upgrade is fast but takes downtime; logical replication lets you cut over while the old DB still serves traffic. The runbook, the gotchas, and the post-cutover checklist.

Kiril Urbonas·3

Read article

••last month

pg_stat_statements — Postgres Query Analysis Without Guessing

The single most useful Postgres extension you might not be using. The queries it surfaces, the indexes it implies, and the operational discipline of reading it weekly.

Kiril Urbonas·8

Read article

Page 1 of 8 · 92 posts