_d
devops/ness
Blog
Reading ListAbout
Subscribe
Featured Article

Monitoring That Actually Helps On-Call: Alerts, Dashboards, and Runbooks

How we went from 200 alerts per week (most ignored) to 15 actionable alerts with clear runbooks and useful dashboards.

InfrastructureMonitoringKubernetesTerraform
KU
Kiril urbonasDevOps Engineer and AI Enthusiast
|Apr 4, 2026
Monitoring That Actually Helps On-Call: Alerts, Dashboards, and Runbooks

Topics

Monitoring291Terraform214AWS174Kubernetes129Python115Security112CI/CD111LLM102Ansible99Linux99

Latest Articles

View All →
SRE Error Budgets in Practice: Shipping Fast Without Burning Reliability
••last month

SRE Error Budgets in Practice: Shipping Fast Without Burning Reliability

A practical way to define SLOs and error budgets, connect them to release decisions, and avoid reliability debates without data.

KU
Kiril Urbonas·2 min read
Read article
Platform Engineering with Backstage: Build a Useful Developer Portal
••last month

Platform Engineering with Backstage: Build a Useful Developer Portal

How to implement Backstage with real templates, scorecards, and golden paths so internal platform work reduces delivery friction.

KU
Kiril Urbonas·2 min read
Read article
Page 4 of 46 · 543 posts
Previous
1...345...46
Next

DevOpsNess

Practical AI, DevOps, Cloud, and Linux guidance for engineering teams

Weekly deep dives, implementation patterns, and reliability-focused playbooks.

Join NewsletterBrowse Posts
_d
devops/ness

A practical blog covering AI, cloud, DevOps, and modern technology for engineering teams.

Explore

  • Latest Articles
  • Archive
  • Reading List

Resources

  • About
  • RSS Feed
  • Newsletter

Legal

GitHub Actions for Monorepos: Fast CI Without Pipeline Chaos
••last month

GitHub Actions for Monorepos: Fast CI Without Pipeline Chaos

A practical pattern for monorepo CI with path filters, matrix builds, caching, and deployment guards that keep feedback fast as teams scale.

KU
Kiril Urbonas·2 min read
Read article
Azure DevOps Best Practices in 2026: Build Pipelines You Can Trust
••last month

Azure DevOps Best Practices in 2026: Build Pipelines You Can Trust

A production-focused guide to Azure DevOps: standardized YAML templates, secure service connections, rollout safety, and measurable delivery reliability.

KU
Kiril Urbonas·6 min read
Read article
AI Best Practices in 2026: Shipping Reliable Systems, Not Demo Magic
••last month

AI Best Practices in 2026: Shipping Reliable Systems, Not Demo Magic

A practical production playbook for AI systems: evaluation gates, guardrails, observability, cost control, and reliable release management.

KU
Kiril Urbonas·4 min read
Read article
AI Best Practices for Engineering Teams: From Prompt Experiments to Platform Discipline
••last month

AI Best Practices for Engineering Teams: From Prompt Experiments to Platform Discipline

A practical field manual for engineering teams who want AI features that survive real users, incidents, and budgets — not just demo day.

KU
Kiril Urbonas·10 min read
Read article
Operational Checklist: AI Inference Cost Optimization
••last month

Operational Checklist: AI Inference Cost Optimization

AI Inference Cost Optimization. Practical guidance for reliable, scalable platform operations.

KU
Kiril Urbonas·4 min read
Read article
What We Learned Running Weekly Game Days on Our CI/CD Pipeline
••last month

What We Learned Running Weekly Game Days on Our CI/CD Pipeline

Practical game day scenarios for CI/CD: broken rollbacks, permission issues, and slow feedback loops—and how we fixed them.

KU
Kiril urbonas·2 min read
Read article
Real-World RAG Incidents: Lessons from a Production Rollout
••last month

Real-World RAG Incidents: Lessons from a Production Rollout

A field report from rolling out retrieval-augmented generation in production, including cache bugs, bad embeddings, and how we fixed them.

KU
Kiril urbonas·2 min read
Read article
How We Stopped Terraform Drift from Surprising On-Call
••last month

How We Stopped Terraform Drift from Surprising On-Call

A real story of removing console-only changes, adding drift detection, and getting Terraform back in charge.

KU
Kiril urbonas·1 min read
Read article
Operational Checklist: SLO-Based Monitoring for APIs
••last month

Operational Checklist: SLO-Based Monitoring for APIs

SLO-Based Monitoring for APIs. Practical guidance for reliable, scalable platform operations.

KU
Kiril Urbonas·4 min read
Read article
Systemd Tricks We Use to Keep Services Boring
••last month

Systemd Tricks We Use to Keep Services Boring

Concrete systemd unit patterns that reduced flakiness: restart policies, resource limits, and structured logs.

KU
Kiril urbonas·1 min read
Read article
  • Privacy
  • Terms

© 2026 DevOpsNess. By Kiril Urbonas.

RSSPrivacyTerms