MLOps — Model Registry vs MLflow Tracking, And When You Need Both
Tracking experiments and shipping models are different problems. The MLOps tooling assumes one solution; production splits them. The patterns we use.
Tracking experiments and shipping models are different problems. The MLOps tooling assumes one solution; production splits them. The patterns we use.
Vault + Kubernetes auth + Vault Agent Injector. The setup, the failure modes during pod startup, and the patterns that beat raw Kubernetes Secrets.
The single most useful Postgres extension you might not be using. The queries it surfaces, the indexes it implies, and the operational discipline of reading it weekly.
io_uring replaces epoll for new high-throughput services. The patterns that earn their place, the gotchas in older kernels, and where we'd still pick epoll.
Three caching patterns, three failure modes. The one we use most, the one that bit us, and the rule that decides which pattern fits which workload.
Picking partition counts and keys decides whether your Kafka consumers scale linearly or hit a wall. The patterns that survived rebalances, partition-count changes, and consumer-group ops.
AI agents for incident triage sound great in demos. We've tried it in production. The patterns that earn their keep, the ones that backfire, and where humans still beat agents.
Production monitoring catches user-facing issues. CI failures stay invisible until someone notices the merge queue is stuck. The metrics and alerts that make pipelines observable.
Version-pinned modules across many repos. The release process, semver discipline, and the breaking-change communication that keeps a shared registry sane.
Most LLM eval suites correlate poorly with what real users experience. The eval patterns we run that move with prod metrics — and the ones that lied to us.
Static thresholds on error rate produce noisy alerts. Burn-rate alerting flips the question to "are we burning the error budget faster than we can sustain?" — and pages only on real problems.
cpu.shares vs cpu.cfs_quota_us vs memory.max — the cgroup mechanics behind Kubernetes resource limits, and the surprises that explain the weird symptoms you've seen.
Bad resource requests waste money or trigger OOMs. The methodology we use to right-size requests based on actual usage, and the gotchas the autoscalers don't fix.