{"name":"DevOpsNess","description":"Practical tutorials and articles on AI, DevOps, cloud, Linux, and infrastructure.","url":"https://www.devopsness.com","contentCount":200,"content":[{"title":"MLOps — Model Registry vs MLflow Tracking, And When You Need Both","url":"https://www.devopsness.com/blog/mlops-model-registry-vs-mlflow-tracking","description":"Tracking experiments and shipping models are different problems. The MLOps tooling assumes one solution; production splits them. The patterns we use.","publishedAt":"2026-06-07T00:00:00.000Z","updatedAt":"2026-06-11T13:28:35.868Z","category":"AI"},{"title":"HashiCorp Vault as a Secrets Backend for Kubernetes","url":"https://www.devopsness.com/blog/vault-as-secrets-backend-for-kubernetes","description":"Vault + Kubernetes auth + Vault Agent Injector. The setup, the failure modes during pod startup, and the patterns that beat raw Kubernetes Secrets.","publishedAt":"2026-06-06T00:00:00.000Z","updatedAt":"2026-06-09T21:38:37.933Z","category":"DevOps"},{"title":"pg_stat_statements — Postgres Query Analysis Without Guessing","url":"https://www.devopsness.com/blog/postgres-pg-stat-statements-query-analysis","description":"The single most useful Postgres extension you might not be using. The queries it surfaces, the indexes it implies, and the operational discipline of reading it weekly.","publishedAt":"2026-06-05T00:00:00.000Z","updatedAt":"2026-06-11T04:45:06.108Z","category":"Infrastructure"},{"title":"Linux io_uring — Async I/O Patterns We Use","url":"https://www.devopsness.com/blog/linux-io-uring-async-io-patterns","description":"io_uring replaces epoll for new high-throughput services. The patterns that earn their place, the gotchas in older kernels, and where we'd still pick epoll.","publishedAt":"2026-06-04T00:00:00.000Z","updatedAt":"2026-06-10T15:24:49.273Z","category":"Linux"},{"title":"Caching Patterns — Read-Through, Write-Through, Cache-Aside in Practice","url":"https://www.devopsness.com/blog/caching-patterns-read-write-through-cache-aside","description":"Three caching patterns, three failure modes. The one we use most, the one that bit us, and the rule that decides which pattern fits which workload.","publishedAt":"2026-06-03T00:00:00.000Z","updatedAt":"2026-06-08T09:24:15.827Z","category":"Cloud"},{"title":"Kafka Partition Strategies — Scaling Consumers Without Reshuffling Everything","url":"https://www.devopsness.com/blog/kafka-partition-strategies-scaling-consumers","description":"Picking partition counts and keys decides whether your Kafka consumers scale linearly or hit a wall. The patterns that survived rebalances, partition-count changes, and consumer-group ops.","publishedAt":"2026-06-02T00:00:00.000Z","updatedAt":"2026-06-07T10:12:01.174Z","category":"DevOps"},{"title":"Agentic Ops — When (and When Not) to Use AI Agents for Incident Response","url":"https://www.devopsness.com/blog/agentic-ops-ai-incident-response","description":"AI agents for incident triage sound great in demos. We've tried it in production. The patterns that earn their keep, the ones that backfire, and where humans still beat agents.","publishedAt":"2026-06-01T00:00:00.000Z","updatedAt":"2026-06-03T09:17:40.594Z","category":"AI"},{"title":"Pipeline Observability — Why CI Failures Don't Trigger Alerts (And Should)","url":"https://www.devopsness.com/blog/pipeline-observability-ci-failures-alerts","description":"Production monitoring catches user-facing issues. CI failures stay invisible until someone notices the merge queue is stuck. The metrics and alerts that make pipelines observable.","publishedAt":"2026-05-31T00:00:00.000Z","updatedAt":"2026-06-07T03:02:10.317Z","category":"DevOps"},{"title":"Terraform Module Versioning and Shared Registries","url":"https://www.devopsness.com/blog/terraform-module-versioning-shared-registries","description":"Version-pinned modules across many repos. The release process, semver discipline, and the breaking-change communication that keeps a shared registry sane.","publishedAt":"2026-05-30T00:00:00.000Z","updatedAt":"2026-06-08T11:36:18.257Z","category":"Infrastructure"},{"title":"LLM Evals That Actually Predict Production Quality","url":"https://www.devopsness.com/blog/llm-evals-that-predict-prod-quality","description":"Most LLM eval suites correlate poorly with what real users experience. The eval patterns we run that move with prod metrics — and the ones that lied to us.","publishedAt":"2026-05-29T00:00:00.000Z","updatedAt":"2026-06-06T10:00:57.569Z","category":"AI"},{"title":"Burn-Rate Alerting — The SLO Discipline That Prevents Alert Fatigue","url":"https://www.devopsness.com/blog/burn-rate-alerting-slo-discipline","description":"Static thresholds on error rate produce noisy alerts. Burn-rate alerting flips the question to \"are we burning the error budget faster than we can sustain?\" — and pages only on real problems.","publishedAt":"2026-05-28T00:00:00.000Z","updatedAt":"2026-06-04T20:11:46.952Z","category":"DevOps"},{"title":"Container Resource Limits — What They Actually Do at the Kernel Level","url":"https://www.devopsness.com/blog/container-resource-limits-what-they-actually-do","description":"cpu.shares vs cpu.cfs_quota_us vs memory.max — the cgroup mechanics behind Kubernetes resource limits, and the surprises that explain the weird symptoms you've seen.","publishedAt":"2026-05-27T00:00:00.000Z","updatedAt":"2026-06-07T09:06:49.426Z","category":"Linux"},{"title":"Kubernetes Resource Requests — Right-Sizing Without Guessing","url":"https://www.devopsness.com/blog/kubernetes-resource-requests-rightsizing","description":"Bad resource requests waste money or trigger OOMs. The methodology we use to right-size requests based on actual usage, and the gotchas the autoscalers don't fix.","publishedAt":"2026-05-26T00:00:00.000Z","updatedAt":"2026-06-03T01:40:48.475Z","category":"Cloud"},{"title":"Supply Chain Security — SBOMs, Attestation, and What to Actually Verify","url":"https://www.devopsness.com/blog/supply-chain-security-sbom-attestation","description":"SBOMs and signed attestations sound like checkboxes until you need to answer \"did this artifact come from our pipeline?\" The minimum viable supply-chain story we run.","publishedAt":"2026-05-25T00:00:00.000Z","updatedAt":"2026-06-06T03:51:37.304Z","category":"DevOps"},{"title":"Edge Databases for Low-Latency Apps — D1, Turso, Neon Serverless","url":"https://www.devopsness.com/blog/edge-databases-for-low-latency-apps","description":"Edge compute is useless without an edge data layer. Three serverless databases that put data within ms of your edge functions, with the tradeoffs that aren't on the marketing pages.","publishedAt":"2026-05-24T00:00:00.000Z","updatedAt":"2026-06-11T07:06:57.999Z","category":"Cloud"},{"title":"Multi-Provider LLM Routing — Failover, Cost Routing, and Load Balancing","url":"https://www.devopsness.com/blog/multi-provider-llm-routing-failover","description":"Single-provider LLM apps fail when the provider does. Multi-provider routing isn't just resilience — it's also a cost lever. The patterns we run.","publishedAt":"2026-05-23T00:00:00.000Z","updatedAt":"2026-06-06T19:05:02.324Z","category":"AI"},{"title":"Postgres Query Plans — Reading Them and the Indexes We Wish We'd Added Sooner","url":"https://www.devopsness.com/blog/postgres-query-plans-and-indexes","description":"EXPLAIN ANALYZE output is dense and intimidating. Once you can read it, most slow-query investigations finish in minutes. The patterns we keep seeing.","publishedAt":"2026-05-22T00:00:00.000Z","updatedAt":"2026-06-06T18:16:09.419Z","category":"Infrastructure"},{"title":"Argo Rollouts — Progressive Delivery Beyond Argo CD","url":"https://www.devopsness.com/blog/argo-rollouts-progressive-delivery","description":"Argo CD ships your manifests; Argo Rollouts ships them gradually with automated quality gates. The setup, the analysis templates that earn their place, and what we measure.","publishedAt":"2026-05-21T00:00:00.000Z","updatedAt":"2026-06-09T02:03:58.715Z","category":"DevOps"},{"title":"eBPF Tools for Everyday Ops — bpftrace Patterns We Use","url":"https://www.devopsness.com/blog/ebpf-tools-for-everyday-ops-bpftrace-patterns","description":"bpftrace one-liners replace strace, perf top, and a half-dozen ad-hoc debugging scripts. The patterns that actually earn their place when you're troubleshooting at 2 AM.","publishedAt":"2026-05-20T00:00:00.000Z","updatedAt":"2026-06-11T10:12:35.510Z","category":"Linux"},{"title":"SLI Design — Picking Metrics That Actually Correlate With User Experience","url":"https://www.devopsness.com/blog/sli-design-metrics-that-correlate-with-user-experience","description":"Wrong SLI metrics mean green dashboards while users churn. The discipline of picking signals that move with what users actually feel, and the ones that look reliable but lie.","publishedAt":"2026-05-19T00:00:00.000Z","updatedAt":"2026-06-06T16:09:37.134Z","category":"DevOps"},{"title":"Cross-Cloud Identity Federation — Patterns That Replaced Our Long-Lived Keys","url":"https://www.devopsness.com/blog/cross-cloud-identity-federation-patterns","description":"OIDC federation between AWS, GCP, and CI providers let us delete every long-lived cloud credential we had. The setup, the gotchas, and the trust-relationship discipline.","publishedAt":"2026-05-18T00:00:00.000Z","updatedAt":"2026-06-03T01:40:41.709Z","category":"Cloud"},{"title":"Hybrid Search — Combining BM25 and Embeddings for Better RAG","url":"https://www.devopsness.com/blog/hybrid-search-bm25-embeddings-rag","description":"Pure vector search misses exact-keyword queries. Pure BM25 misses semantic ones. Combining them with reciprocal rank fusion is the simplest large win in RAG retrieval.","publishedAt":"2026-05-17T00:00:00.000Z","updatedAt":"2026-06-05T06:18:06.952Z","category":"AI"},{"title":"Handling Vulnerabilities in Production — What We Actually Do","url":"https://www.devopsness.com/blog/handling-vulnerabilities-in-production","description":"You always have known vulnerabilities. The question is how you triage, patch, and respond. The discipline we run after a few real incidents and a lot of routine work.","publishedAt":"2026-05-16T00:00:00.000Z","updatedAt":"2026-06-09T01:50:32.407Z","category":"DevOps"},{"title":"Proxy vs Reverse Proxy vs Load Balancer — What's Actually Different","url":"https://www.devopsness.com/blog/proxy-vs-reverse-proxy-vs-load-balancer","description":"Three terms that get mixed up constantly. The actual differences, where each one sits in the request path, when you reach for which, and where the same tool plays all three roles.","publishedAt":"2026-05-15T00:00:00.000Z","updatedAt":"2026-06-07T22:44:00.882Z","category":"DevOps"},{"title":"Database Backups — Testing Restores, Not Just Taking Them","url":"https://www.devopsness.com/blog/database-backup-restoration-testing-restores","description":"Backups are easy. Restores are hard. The quarterly drill we run, what's failed during it, and the discipline that makes \"we have backups\" actually mean something.","publishedAt":"2026-05-14T00:00:00.000Z","updatedAt":"2026-06-11T02:58:27.360Z","category":"Infrastructure"},{"title":"Helm Chart Anti-Patterns We've Stopped Using","url":"https://www.devopsness.com/blog/helm-chart-anti-patterns","description":"Helm gives you a lot of rope. The patterns we used that backfired, the ones we replaced them with, and what to skip if you're starting today.","publishedAt":"2026-05-13T00:00:00.000Z","updatedAt":"2026-06-10T18:05:57.818Z","category":"DevOps"},{"title":"CDN Cache Invalidation — Strategies That Don't Break in Production","url":"https://www.devopsness.com/blog/cdn-cache-invalidation-strategies","description":"There are two hard problems in computer science.\" We've worked on the cache-invalidation one for a while. The patterns that hold up at scale and the ones that look clean and aren't.","publishedAt":"2026-05-12T00:00:00.000Z","updatedAt":"2026-06-08T07:51:34.203Z","category":"Cloud"},{"title":"Embeddings Drift Detection — When \"Similar Enough\" Stops Being Similar","url":"https://www.devopsness.com/blog/embeddings-drift-detection-when-similar-stops","description":"Embedding indexes degrade silently. The signals that catch drift, how often to re-embed, and the operational patterns we built after one quiet quality regression.","publishedAt":"2026-05-11T00:00:00.000Z","updatedAt":"2026-06-11T13:08:24.657Z","category":"AI"},{"title":"Job Queues — Sidekiq, Celery, BullMQ Patterns That Hold Up","url":"https://www.devopsness.com/blog/job-queues-sidekiq-celery-bullmq-patterns","description":"We run three different job queue systems across our services. The patterns that work across all of them, the differences that matter, and the operational gotchas.","publishedAt":"2026-05-10T00:00:00.000Z","updatedAt":"2026-06-10T20:41:18.031Z","category":"DevOps"},{"title":"systemd Timers vs Cron — What We Learned Switching","url":"https://www.devopsness.com/blog/systemd-timers-vs-cron-what-we-learned","description":"We migrated most scheduled jobs from cron to systemd timers. The wins, the gotchas, and the cases we kept on cron anyway.","publishedAt":"2026-05-09T00:00:00.000Z","updatedAt":"2026-06-09T14:32:26.923Z","category":"Linux"},{"title":"AWS Step Functions for Workflow Orchestration","url":"https://www.devopsness.com/blog/aws-step-functions-workflow-orchestration","description":"We use Step Functions for batch processing, document ingestion, and a few agentic workflows. The patterns that work, the limits we hit, and where we'd reach for something else.","publishedAt":"2026-05-08T00:00:00.000Z","updatedAt":"2026-06-11T08:26:27.777Z","category":"Cloud"},{"title":"LLM Streaming UX — Backpressure, Cancellation, Partial Results","url":"https://www.devopsness.com/blog/llm-streaming-ux-backpressure-cancellation","description":"Streaming LLM responses is easy until the client disconnects, the model stalls, or the user cancels. The patterns that keep streaming responsive without leaking spend.","publishedAt":"2026-05-07T00:00:00.000Z","updatedAt":"2026-06-01T11:33:10.920Z","category":"AI"},{"title":"Internal Developer Platforms — Backstage in Practice","url":"https://www.devopsness.com/blog/internal-developer-platforms-backstage-in-practice","description":"We adopted Backstage for service catalogs and templates. What works, what was over-engineered for our size, and what we'd do differently.","publishedAt":"2026-05-06T00:00:00.000Z","updatedAt":"2026-06-05T22:56:01.228Z","category":"DevOps"},{"title":"Postgres Replication Lag — Monitoring and Failover Practice","url":"https://www.devopsness.com/blog/postgres-replication-lag-failover-practice","description":"Replication is the foundation of database HA. What we monitor, how we practice failover, and the gotchas that show up only when you actually fail over.","publishedAt":"2026-05-05T00:00:00.000Z","updatedAt":"2026-06-08T01:56:07.220Z","category":"Infrastructure"},{"title":"Bash One-Liners We Actually Use","url":"https://www.devopsness.com/blog/bash-one-liners-we-actually-use","description":"A curated list of shell one-liners that earn their place in real ops work — the ones I reach for weekly, not the trick-shot variety.","publishedAt":"2026-05-04T00:00:00.000Z","updatedAt":"2026-06-11T09:52:45.695Z","category":"Linux"},{"title":"Karpenter — Node Provisioning Patterns at Scale","url":"https://www.devopsness.com/blog/karpenter-node-provisioning-patterns-at-scale","description":"After two years of running Karpenter on production EKS clusters, the NodePool patterns that survived, the ones we replaced, and the tuning that matters.","publishedAt":"2026-05-03T00:00:00.000Z","updatedAt":"2026-06-06T18:31:27.122Z","category":"Cloud"},{"title":"AI Agent Tool Design — Boundaries and Confirmations","url":"https://www.devopsness.com/blog/ai-agent-tool-design-boundaries-confirmations","description":"When LLMs can call tools that change real state, the design decisions that matter most are about what's gated, what's automatic, and what triggers a human checkpoint.","publishedAt":"2026-05-02T00:00:00.000Z","updatedAt":"2026-05-27T12:03:13.608Z","category":"AI"},{"title":"Chaos Engineering — What We Actually Run as Game Days","url":"https://www.devopsness.com/blog/chaos-engineering-game-days-platform-teams","description":"We run a chaos game day each quarter. The scenarios that surfaced real problems, the ones that didn't, and the operational discipline that makes the practice pay back.","publishedAt":"2026-05-01T00:00:00.000Z","updatedAt":"2026-06-07T23:36:09.431Z","category":"DevOps"},{"title":"Postgres Connection Pooling — PgBouncer in Front of RDS","url":"https://www.devopsness.com/blog/postgres-connection-pooling-pgbouncer","description":"Why Postgres connection limits bite at unexpected times, the pooling layer we put in front, and the pool-mode tradeoffs we learned the hard way.","publishedAt":"2026-04-30T00:00:00.000Z","updatedAt":"2026-06-05T09:51:05.064Z","category":"Infrastructure"},{"title":"What Are Embeddings? A Beginner's Guide with Code","url":"https://www.devopsness.com/blog/what-are-embeddings-beginner-guide","description":"Embeddings turn text into numbers a computer can compare. Here's the working mental model, a runnable Python example, and where embeddings fit in real apps.","publishedAt":"2026-04-29T18:48:06.923Z","updatedAt":"2026-06-08T16:08:00.748Z","category":"AI"},{"title":"Terraform Tutorial — Your First Infrastructure-as-Code Project","url":"https://www.devopsness.com/blog/terraform-tutorial-first-iac-project","description":"Provision real cloud resources with Terraform — a VPC, an S3 bucket, and an EC2 instance — using the standard init/plan/apply workflow.","publishedAt":"2026-04-29T18:48:02.259Z","updatedAt":"2026-06-09T07:31:03.904Z","category":"Infrastructure"},{"title":"SSH Tutorial — Keys, Config, and Working Remotely","url":"https://www.devopsness.com/blog/ssh-tutorial-keys-config-remote-work","description":"Generate an SSH key, set up passwordless login, and configure aliases for the servers you use daily — all without copy-pasting yet another long command.","publishedAt":"2026-04-29T18:47:57.183Z","updatedAt":"2026-06-07T20:49:58.288Z","category":"Linux"},{"title":"Prompt Engineering Basics — From \"Help Me\" to Working Prompts","url":"https://www.devopsness.com/blog/prompt-engineering-basics-tutorial","description":"A hands-on intro to prompt engineering. Learn the four levers (role, format, examples, constraints) and watch a vague prompt turn into a reliable one.","publishedAt":"2026-04-29T18:47:52.577Z","updatedAt":"2026-05-31T09:18:46.406Z","category":"AI"},{"title":"Linux File Permissions — Read, Write, Execute Without Tears","url":"https://www.devopsness.com/blog/linux-file-permissions-explained","description":"A clear walkthrough of Linux file permissions. Read the funny rwx- letters, change them safely with chmod, fix \"permission denied\" errors with confidence.","publishedAt":"2026-04-29T18:47:47.952Z","updatedAt":"2026-06-09T00:10:45.500Z","category":"Linux"},{"title":"Kubernetes 101 — Pods, Deployments, and Services Explained","url":"https://www.devopsness.com/blog/kubernetes-101-pods-deployments-services","description":"Run your first three Kubernetes objects — Pod, Deployment, Service — on a local cluster, then understand why each one exists and how they fit together.","publishedAt":"2026-04-29T18:47:42.868Z","updatedAt":"2026-06-09T09:19:19.373Z","category":"DevOps"},{"title":"GitOps Explained — What It Is and Why Teams Adopt It","url":"https://www.devopsness.com/blog/gitops-explained-introduction","description":"GitOps in plain words — what it actually is, the workflow it enables, and a hands-on demo using Argo CD on a local Kubernetes cluster.","publishedAt":"2026-04-29T18:47:38.356Z","updatedAt":"2026-06-08T04:29:25.408Z","category":"Infrastructure"},{"title":"Your First CI/CD Pipeline with GitHub Actions","url":"https://www.devopsness.com/blog/first-cicd-pipeline-github-actions-tutorial","description":"Walk through a working GitHub Actions workflow — install, test, build, deploy — for a tiny Node app. Every line explained.","publishedAt":"2026-04-29T18:47:32.442Z","updatedAt":"2026-06-10T14:31:38.179Z","category":"DevOps"},{"title":"Docker for Beginners — Build, Run, and Ship Your First Container","url":"https://www.devopsness.com/blog/docker-beginners-tutorial-first-container","description":"Walk through your first Dockerfile, container run, and image push in 30 minutes. No theory dumps — just the commands and what each one is doing.","publishedAt":"2026-04-29T18:47:28.104Z","updatedAt":"2026-06-10T12:08:16.041Z","category":"DevOps"},{"title":"Build Your First RAG App in 100 Lines of Python","url":"https://www.devopsness.com/blog/build-first-rag-app-python-tutorial","description":"A working retrieval-augmented generation app you can run today. Markdown ingestion, embeddings, semantic search, and an LLM answer — start to finish in one afternoon.","publishedAt":"2026-04-29T18:47:22.404Z","updatedAt":"2026-06-10T14:18:26.770Z","category":"AI"},{"title":"Bash Scripting Tutorial — Write Your First Useful Script","url":"https://www.devopsness.com/blog/bash-scripting-tutorial-first-script","description":"Build a real disk-cleanup script step by step. Learn variables, conditionals, loops, error handling, and the safety preamble that prevents foot-guns.","publishedAt":"2026-04-29T18:47:15.710Z","updatedAt":"2026-05-31T00:28:51.270Z","category":"Linux"},{"title":"AWS VPC Explained — Subnets, Route Tables, and the Internet Gateway","url":"https://www.devopsness.com/blog/aws-vpc-explained-beginner-guide","description":"A working mental model for AWS VPCs — what each piece does, how they connect, and why \"VPC\" is the wrong mental model if you came from physical networks.","publishedAt":"2026-04-29T18:47:09.519Z","updatedAt":"2026-06-01T09:08:52.392Z","category":"Cloud"},{"title":"AWS S3 Tutorial — Buckets, Permissions, and Common Pitfalls","url":"https://www.devopsness.com/blog/aws-s3-tutorial-buckets-permissions","description":"Create your first S3 bucket, upload and download files, and set up the right access controls — without accidentally making everything public.","publishedAt":"2026-04-29T18:47:04.307Z","updatedAt":"2026-06-10T13:01:36.259Z","category":"Cloud"},{"title":"AWS Lambda — Deploy Your First Serverless Function","url":"https://www.devopsness.com/blog/aws-lambda-deploy-first-serverless-function","description":"Write, package, and deploy a Lambda function using only the AWS CLI. Trigger it via a public URL. Understand what serverless actually means.","publishedAt":"2026-04-29T18:46:58.792Z","updatedAt":"2026-06-05T08:26:22.514Z","category":"Cloud"},{"title":"Ansible Tutorial — Configure a Server in 30 Minutes","url":"https://www.devopsness.com/blog/ansible-tutorial-configure-server","description":"Install Ansible, write your first playbook, and configure a remote server (nginx + a deploy user) without touching it manually. The basics that scale up.","publishedAt":"2026-04-29T18:46:52.831Z","updatedAt":"2026-06-11T07:48:42.451Z","category":"Infrastructure"},{"title":"Feature Flags in Production — Provider Choice and Operational Reality","url":"https://www.devopsness.com/blog/feature-flags-in-production-operational-reality","description":"We use feature flags on roughly every customer-facing change. The provider tradeoff, the patterns that hold up, and the failure modes that show up only after a couple of years.","publishedAt":"2026-04-28T00:00:00.000Z","updatedAt":"2026-06-11T09:28:44.806Z","category":"DevOps"},{"title":"Distributed Tracing with OpenTelemetry — What We Ship, What We Skip","url":"https://www.devopsness.com/blog/distributed-tracing-opentelemetry-what-we-ship","description":"How we run OpenTelemetry across ~40 services. The instrumentation that earns its place, the patterns we abandoned, and what tracing actually catches that metrics don't.","publishedAt":"2026-04-27T00:00:00.000Z","updatedAt":"2026-05-22T10:07:16.662Z","category":"DevOps"},{"title":"Postgres Autovacuum — Tuning From Production Stalls","url":"https://www.devopsness.com/blog/postgres-autovacuum-tuning-from-production-stalls","description":"A 2 AM incident, the autovacuum settings that caused it, and the parameter changes that prevented the next one. The discipline that took our biggest Postgres host from periodic stalls to steady.","publishedAt":"2026-04-26T00:00:00.000Z","updatedAt":"2026-05-21T08:21:38.950Z","category":"Infrastructure"},{"title":"Fine-Tuning vs RAG vs Long-Context: A Decision Framework With Numbers","url":"https://www.devopsness.com/blog/fine-tuning-vs-rag-vs-long-context-a-decision-framework-with-numbers-2026-04-25","description":"We've shipped all three patterns to production. They're not interchangeable. Here's the framework we now use to decide which approach fits a given task.","publishedAt":"2026-04-25T12:00:00.000Z","updatedAt":"2026-06-07T20:23:52.158Z","category":"AI"},{"title":"Database Connection Pooling at Scale: PgBouncer, RDS Proxy, Application Pool","url":"https://www.devopsness.com/blog/database-connection-pooling-at-scale-pgbouncer-rds-proxy-application-pool-2026-04-24","description":"Three layers of pooling, three different jobs. We learned the hard way which to use when. Real numbers from a 8k-connection workload.","publishedAt":"2026-04-24T12:00:00.000Z","updatedAt":"2026-06-11T10:28:53.959Z","category":"DevOps"},{"title":"Backstage Adoption: From Demo to 80% Service Coverage in 6 Months","url":"https://www.devopsness.com/blog/backstage-adoption-from-demo-to-80-service-coverage-in-6-months-2026-04-23","description":"We launched Backstage in October. Six months in, 80% of services are catalogued, on-boarding takes a third of the time, and we mostly know what owns what.","publishedAt":"2026-04-23T12:00:00.000Z","updatedAt":"2026-06-08T06:23:52.688Z","category":"Infrastructure"},{"title":"Cloudflare Workers vs Vercel Edge: A Latency-Cost Comparison","url":"https://www.devopsness.com/blog/cloudflare-workers-vs-vercel-edge-a-latency-cost-comparison-2026-04-22","description":"We deployed the same edge function on both platforms and measured for a quarter. Where each wins, where each loses, and the surprises along the way.","publishedAt":"2026-04-22T12:00:00.000Z","updatedAt":"2026-06-10T07:10:15.207Z","category":"Cloud"},{"title":"eBPF for SREs: Three Real Diagnoses That Saved Hours","url":"https://www.devopsness.com/blog/ebpf-for-sres-three-real-diagnoses-that-saved-hours-2026-04-21","description":"We started using eBPF tooling for ad-hoc production debugging six months ago. Three real incidents where it cut investigation time from hours to minutes.","publishedAt":"2026-04-21T12:00:00.000Z","updatedAt":"2026-05-30T21:36:53.993Z","category":"Linux"},{"title":"LLM Output Validation: Schema-First Prompt Engineering Patterns","url":"https://www.devopsness.com/blog/llm-output-validation-schema-first-prompt-engineering-patterns-2026-04-20","description":"We invalidate ~6% of LLM outputs before they reach a downstream system. Here's how we structure prompts and validators to catch malformed responses early.","publishedAt":"2026-04-20T12:00:00.000Z","updatedAt":"2026-06-08T13:20:02.577Z","category":"AI"},{"title":"Argo Rollouts: Canary Deployments That Caught a $40k Bug","url":"https://www.devopsness.com/blog/argo-rollouts-canary-deployments-that-caught-a-40k-bug-2026-04-19","description":"A two-line config change to an Argo Rollouts analysis template caught a regression that would have cost ~$40k in API spend before we noticed. Here's the pattern.","publishedAt":"2026-04-19T12:00:00.000Z","updatedAt":"2026-05-18T17:21:14.960Z","category":"DevOps"},{"title":"Pulumi vs Terraform: What 18 Months of Production Taught Us","url":"https://www.devopsness.com/blog/pulumi-vs-terraform-what-18-months-of-production-taught-us-2026-04-18","description":"We ran Pulumi in TypeScript and Terraform in HCL side by side across 60+ services. Each won different categories of work. Here's the breakdown.","publishedAt":"2026-04-18T12:00:00.000Z","updatedAt":"2026-05-23T00:16:56.451Z","category":"Infrastructure"},{"title":"GCP Workload Identity Federation: Replacing Service Account Keys","url":"https://www.devopsness.com/blog/gcp-workload-identity-federation-replacing-service-account-keys-2026-04-17","description":"We deleted every static GCP service account key in our org over six weeks. Here's the migration plan, the gotchas, and the policies we now enforce.","publishedAt":"2026-04-17T12:00:00.000Z","updatedAt":"2026-06-06T19:29:21.834Z","category":"Cloud"},{"title":"Linux Memory Management: When OOM Killer Strikes Your K8s Pods","url":"https://www.devopsness.com/blog/linux-memory-management-when-oom-killer-strikes-your-k8s-pods-2026-04-16","description":"Three production OOM incidents that taught us how kubelet, containerd, and the kernel actually decide which process dies. With debugging commands you'll wish you had earlier.","publishedAt":"2026-04-16T12:00:00.000Z","updatedAt":"2026-06-11T10:12:28.152Z","category":"Linux"},{"title":"GitHub Actions Self-Hosted Runners: Why We Switched and What Broke","url":"https://www.devopsness.com/blog/github-actions-self-hosted-runners-why-we-switched-and-what-broke-2026-04-15","description":"Bills hit $3,400/mo for runner minutes. We moved to self-hosted on EKS spot. The savings were real; the surprises were too.","publishedAt":"2026-04-15T12:00:00.000Z","updatedAt":"2026-05-18T17:21:13.851Z","category":"DevOps"},{"title":"Vector Database Selection: Pinecone, pgvector, Qdrant After 6 Months in Production","url":"https://www.devopsness.com/blog/vector-database-selection-pinecone-pgvector-qdrant-after-6-months-in-production-2026-04-14","description":"We ran the same RAG workload across three vector stores for a quarter each. Here's what we learned about latency, cost, and operational overhead.","publishedAt":"2026-04-14T12:00:00.000Z","updatedAt":"2026-05-28T09:57:19.583Z","category":"AI"},{"title":"Pre-Commit Hooks That Saved Our Repo: 7 Real Examples","url":"https://www.devopsness.com/blog/pre-commit-hooks-that-saved-our-repo-7-real-examples-2026-04-13","description":"Every hook on this list caught a bug or a security issue in the last twelve months. The configs are short. The savings have been considerable.","publishedAt":"2026-04-13T12:00:00.000Z","updatedAt":"2026-05-18T17:21:13.394Z","category":"DevOps"},{"title":"EKS Auto Mode: What Worked, What Broke in Our Migration","url":"https://www.devopsness.com/blog/eks-auto-mode-what-worked-what-broke-in-our-migration-2026-04-12","description":"We moved a 60-node production EKS cluster to Auto Mode. Some pain points evaporated, others got harder. The cost picture is more nuanced than the marketing suggests.","publishedAt":"2026-04-12T12:00:00.000Z","updatedAt":"2026-06-09T07:20:16.741Z","category":"Cloud"},{"title":"Self-Hosted LLMs vs OpenAI API: A Cost-vs-Latency Analysis After 6 Months","url":"https://www.devopsness.com/blog/self-hosted-llms-vs-openai-api-a-cost-vs-latency-analysis-after-6-months-2026-04-11","description":"We ran the same workload on both for half a year. The break-even point isn't where most blog posts say it is — and the latency story has more nuance than throughput-per-dollar charts admit.","publishedAt":"2026-04-11T12:00:00.000Z","updatedAt":"2026-06-07T19:49:36.004Z","category":"AI"},{"title":"OpenTelemetry Collector Pipelines: Real Configs That Survived Production","url":"https://www.devopsness.com/blog/opentelemetry-collector-pipelines-real-configs-that-survived-production-2026-04-10","description":"We've been running the OTel Collector at the edge of every cluster for 18 months. The config patterns that lasted, the ones we ripped out, and a few processors that quietly saved us money.","publishedAt":"2026-04-10T12:00:00.000Z","updatedAt":"2026-05-18T17:21:12.580Z","category":"DevOps"},{"title":"Blue/Green Deploys for Stateful Services: A Postgres Cutover Story","url":"https://www.devopsness.com/blog/blue-green-deploys-for-stateful-services-a-postgres-cutover-story-2026-04-09","description":"Blue/green is easy for stateless services. We did it for our primary Postgres cluster with 3.2TB of data and ~8k connections. Here's exactly how — and what almost went wrong.","publishedAt":"2026-04-09T12:00:00.000Z","updatedAt":"2026-06-07T23:59:49.752Z","category":"DevOps"},{"title":"systemd Timers vs Cron: When We Switched and What We Learned","url":"https://www.devopsness.com/blog/systemd-timers-vs-cron-when-we-switched-and-what-we-learned-2026-04-08","description":"We migrated 47 cron jobs to systemd timers across our fleet. The mechanical conversion was easy. The interesting parts were the bugs we found that cron had been hiding.","publishedAt":"2026-04-08T12:00:00.000Z","updatedAt":"2026-05-26T21:01:40.817Z","category":"Linux"},{"title":"Zero Trust on AWS: Lessons From Implementing IAM Identity Center","url":"https://www.devopsness.com/blog/zero-trust-on-aws-lessons-from-implementing-iam-identity-center-2026-04-07","description":"We replaced 14 long-lived IAM users with SSO + temporary credentials. The migration plan, the gotchas, and the policies we now enforce.","publishedAt":"2026-04-07T12:00:00.000Z","updatedAt":"2026-05-27T13:10:29.780Z","category":"Cloud"},{"title":"Embedding Quality in RAG: How We Cut Hallucinations by 60%","url":"https://www.devopsness.com/blog/embedding-quality-in-rag-how-we-cut-hallucinations-by-60-2026-04-06","description":"Six months running RAG in production taught us that the retrieval step matters far more than the model. Concrete techniques that moved the needle, with before/after numbers.","publishedAt":"2026-04-06T12:00:00.000Z","updatedAt":"2026-06-11T04:45:12.329Z","category":"AI"},{"title":"Database Migrations Without Downtime: Patterns From Three Real Cutovers","url":"https://www.devopsness.com/blog/database-migrations-without-downtime-patterns-from-three-real-cutovers-2026-04-05","description":"How we shipped three schema migrations with zero customer impact. Expand-then-contract, dual-writes, and the rollback plan we never had to use — but tested anyway.","publishedAt":"2026-04-05T12:00:00.000Z","updatedAt":"2026-06-08T21:16:59.117Z","category":"Infrastructure"},{"title":"Monitoring That Actually Helps On-Call: Alerts, Dashboards, and Runbooks","url":"https://www.devopsness.com/blog/monitoring-that-actually-helps-on-call-alerts-dashboards-and-runbooks","description":"We were drowning in 200 alerts a week. Most got ignored. After a quarter of triage and rework, we're at about 15 — and on-call actually responds to them.","publishedAt":"2026-04-04T12:00:00.000Z","updatedAt":"2026-06-08T08:10:15.549Z","category":"Infrastructure"},{"title":"Secrets Management in Practice: From .env Files to Vault","url":"https://www.devopsness.com/blog/secrets-management-in-practice-from-env-files-to-vault","description":"We had .env files in three repos, AWS keys in Slack DMs, and a postgres password etched into a Confluence page. Cleaning it up took a sprint and changed how we think about secrets.","publishedAt":"2026-04-03T12:00:00.000Z","updatedAt":"2026-05-23T00:04:54.505Z","category":"Cloud"},{"title":"Incident Postmortems That Actually Prevent Repeat Failures","url":"https://www.devopsness.com/blog/incident-postmortems-that-actually-prevent-repeat-failures","description":"We wrote pretty postmortems for two years and kept hitting the same incidents. Here's what changed when we started writing ugly ones.","publishedAt":"2026-04-02T12:00:00.000Z","updatedAt":"2026-05-18T17:21:10.556Z","category":"DevOps"},{"title":"Terraform Modules Done Right: Lessons from Managing 50+ Services","url":"https://www.devopsness.com/blog/terraform-modules-done-right-lessons-from-managing-50-services","description":"Practical patterns for Terraform modules at scale: versioning, composition, testing, and avoiding the monolith trap.","publishedAt":"2026-04-01T12:00:00.000Z","updatedAt":"2026-06-01T05:11:28.916Z","category":"Infrastructure"},{"title":"Linux Performance Troubleshooting: A Real Incident Walkthrough","url":"https://www.devopsness.com/blog/linux-performance-troubleshooting-a-real-incident-walkthrough","description":"Step-by-step debugging of a production Linux server hitting 100% CPU. From top to perf to the actual fix.","publishedAt":"2026-03-31T12:00:00.000Z","updatedAt":"2026-06-11T03:42:08.843Z","category":"Linux"},{"title":"Prompt Engineering Patterns That Actually Work in Production","url":"https://www.devopsness.com/blog/prompt-engineering-patterns-that-actually-work-in-production","description":"Battle-tested prompt patterns from running LLM features in production: structured output, chain-of-thought, and graceful failure handling.","publishedAt":"2026-03-30T12:00:00.000Z","updatedAt":"2026-05-26T10:04:25.468Z","category":"AI"},{"title":"AWS Cost Audit: 7 Things We Found Wasting Money Every Month","url":"https://www.devopsness.com/blog/aws-cost-audit-7-things-we-found-wasting-money-every-month","description":"A real cost audit uncovered idle load balancers, oversized RDS instances, and forgotten snapshots. Here's what we found and how we fixed each one.","publishedAt":"2026-03-29T12:00:00.000Z","updatedAt":"2026-05-28T05:39:39.421Z","category":"Cloud"},{"title":"How We Cut Our Docker Image Size by 80% and Why It Matters","url":"https://www.devopsness.com/blog/how-we-cut-our-docker-image-size-by-80-and-why-it-matters","description":"A real walkthrough of shrinking bloated Docker images from 1.2GB to 240MB using multi-stage builds, Alpine, and dependency auditing.","publishedAt":"2026-03-28T12:00:00.000Z","updatedAt":"2026-05-18T17:21:09.467Z","category":"DevOps"},{"title":"Model Fallback Policies for Customer-Facing AI: The Routing Rules That Kept SLA Intact","url":"https://www.devopsness.com/blog/model-fallback-policies-for-customer-facing-ai-the-routing-rules-that-kept-sla-intact-2026-03-27","description":"A real-world model fallback guide for customer-facing AI systems, covering how one team preserved response quality and support SLAs during a partial provider degradation.","publishedAt":"2026-03-27T12:00:00.000Z","updatedAt":"2026-06-08T11:55:15.157Z","category":"AI"},{"title":"Artifact Promotion Instead of Rebuilds: The Release Control Pattern That Stopped Drift","url":"https://www.devopsness.com/blog/artifact-promotion-instead-of-rebuilds-the-release-control-pattern-that-stopped-drift-2026-03-26","description":"A practical artifact promotion guide for CI/CD teams that were tired of hearing 'it passed in staging' after production behaved differently because the release was rebuilt.","publishedAt":"2026-03-26T12:00:00.000Z","updatedAt":"2026-06-09T08:16:29.997Z","category":"DevOps"},{"title":"RDS Restore Drills for Busy Teams: The Recovery Workflow That Surfaced Real Gaps","url":"https://www.devopsness.com/blog/rds-restore-drills-for-busy-teams-the-recovery-workflow-that-surfaced-real-gaps-2026-03-25","description":"A hands-on RDS restore drill guide for small cloud teams that thought backups were covered until a timed restore test exposed missing steps, DNS confusion, and stale credentials.","publishedAt":"2026-03-25T12:00:00.000Z","updatedAt":"2026-06-01T07:52:06.730Z","category":"Cloud"},{"title":"Systemd Drop-In Overrides for Vendor Services: The Supportable Linux Ops Pattern","url":"https://www.devopsness.com/blog/systemd-drop-in-overrides-for-vendor-services-the-supportable-linux-ops-pattern-2026-03-24","description":"A practical systemd drop-in guide built from a real operations problem: vendor unit files kept changing, but the team still needed consistent restart, environment, and logging behavior.","publishedAt":"2026-03-24T12:00:00.000Z","updatedAt":"2026-05-30T16:08:44.838Z","category":"Linux"},{"title":"Terraform Module Version Pinning: How One Platform Team Stopped Surprise Breakage","url":"https://www.devopsness.com/blog/terraform-module-version-pinning-how-one-platform-team-stopped-surprise-breakage-2026-03-23","description":"A real-world Terraform module version pinning guide for platform teams that want safer upgrades, clearer ownership, and fewer broken pipelines after shared module releases.","publishedAt":"2026-03-23T12:00:00.000Z","updatedAt":"2026-05-20T12:57:06.387Z","category":"Infrastructure"},{"title":"Embedding Model Upgrades Without Search Chaos: A Safer RAG Rollout Pattern","url":"https://www.devopsness.com/blog/embedding-model-upgrades-without-search-chaos-a-safer-rag-rollout-pattern-2026-03-22","description":"A practical embedding model upgrade guide for RAG systems, built from a real support-search migration that initially reduced answer quality instead of improving it.","publishedAt":"2026-03-22T12:00:00.000Z","updatedAt":"2026-06-06T06:18:58.832Z","category":"AI"},{"title":"Multi-Cluster Traffic Routing Strategies: A Pragmatic Rollout Pattern for Growing SaaS Teams","url":"https://www.devopsness.com/blog/multi-cluster-traffic-routing-strategies-a-pragmatic-rollout-pattern-for-growing-saas-teams-2026-03-21","description":"A real-world multi-cluster traffic routing guide for SaaS teams that have outgrown a single Kubernetes cluster and need safer rollout control without a service-mesh science project.","publishedAt":"2026-03-21T12:00:00.000Z","updatedAt":"2026-06-09T12:39:08.752Z","category":"Cloud"},{"title":"Terraform State Isolation by Environment: How We Stopped One Change from Hitting Prod","url":"https://www.devopsness.com/blog/terraform-state-isolation-by-environment-how-we-stopped-one-change-from-hitting-prod-2026-03-20","description":"A practical Terraform state isolation guide built from a real environment-mixing incident, with patterns for safer backends, clearer ownership, and lower blast radius.","publishedAt":"2026-03-20T12:00:00.000Z","updatedAt":"2026-06-06T18:33:24.578Z","category":"Infrastructure"},{"title":"Prompt Versioning and Regression Testing: How Teams Avoid Silent AI Regressions","url":"https://www.devopsness.com/blog/prompt-versioning-and-regression-testing-how-teams-avoid-silent-ai-regressions-2026-03-19","description":"A real-world guide to prompt versioning and regression testing for production AI features, focused on preventing the subtle changes that hurt quality long before anyone notices.","publishedAt":"2026-03-19T12:00:00.000Z","updatedAt":"2026-05-18T17:21:07.534Z","category":"AI"},{"title":"Systemd Service Reliability Patterns: What We Changed After Repeated Restart Loops","url":"https://www.devopsness.com/blog/systemd-service-reliability-patterns-what-we-changed-after-repeated-restart-loops-2026-03-18","description":"A practical systemd reliability guide for Linux services, built around repeated restart-loop incidents and the unit-file patterns that finally made those services boring.","publishedAt":"2026-03-18T12:00:00.000Z","updatedAt":"2026-05-18T17:21:07.318Z","category":"Linux"},{"title":"Blue-Green Deployment Guardrails in Kubernetes: Lessons from a Failed Friday Rollout","url":"https://www.devopsness.com/blog/blue-green-deployment-guardrails-in-kubernetes-lessons-from-a-failed-friday-rollout-2026-03-17","description":"A Kubernetes blue-green deployment guide built around a real rollout failure, showing the guardrails that matter when traffic shifting, health checks, and rollback timing all interact.","publishedAt":"2026-03-17T12:00:00.000Z","updatedAt":"2026-05-26T13:31:04.445Z","category":"DevOps"},{"title":"Cloud Disaster Recovery Runbook Design: How Small Teams Rehearse Multi-Region Failover","url":"https://www.devopsness.com/blog/cloud-disaster-recovery-runbook-design-how-small-teams-rehearse-multi-region-failover-2026-03-16","description":"A practical disaster recovery runbook guide for small cloud teams that need realistic failover steps, clear ownership, and repeatable rehearsals instead of shelfware documents.","publishedAt":"2026-03-16T12:00:00.000Z","updatedAt":"2026-06-11T09:36:26.318Z","category":"Cloud"},{"title":"RAG Retrieval Quality Evaluation: The Checks We Added After Bad Answers Reached Production","url":"https://www.devopsness.com/blog/rag-retrieval-quality-evaluation-the-checks-we-added-after-bad-answers-reached-production-2026-03-15","description":"A search-friendly guide to RAG retrieval quality evaluation, based on the moment one production assistant started citing stale documents and the team had to prove what 'good retrieval' meant.","publishedAt":"2026-03-15T12:00:00.000Z","updatedAt":"2026-05-18T17:21:06.699Z","category":"AI"},{"title":"Infrastructure Documentation as Code: How One Platform Team Reduced Audit Fire Drills","url":"https://www.devopsness.com/blog/infrastructure-documentation-as-code-how-one-platform-team-reduced-audit-fire-drills-2026-03-14","description":"This infrastructure documentation as code guide shows how a platform team moved runbooks, ownership maps, and architecture decisions into versioned workflows that people actually trusted.","publishedAt":"2026-03-14T12:00:00.000Z","updatedAt":"2026-06-01T08:50:54.919Z","category":"Infrastructure"},{"title":"Linux Patch Management for Production Fleets: A Real-World Maintenance Workflow","url":"https://www.devopsness.com/blog/linux-patch-management-for-production-fleets-a-real-world-maintenance-workflow-2026-03-13","description":"A production-tested Linux patch management workflow for teams that need security fixes without turning every maintenance window into a gamble.","publishedAt":"2026-03-13T12:00:00.000Z","updatedAt":"2026-05-18T17:21:05.950Z","category":"Linux"},{"title":"AWS Cost Allocation Tags for Shared Platforms: What Finally Worked","url":"https://www.devopsness.com/blog/aws-cost-allocation-tags-for-shared-platforms-what-finally-worked-2026-03-12","description":"A hands-on guide to AWS cost allocation tags for shared environments, built from a real platform-team problem: everyone used the cluster, but nobody trusted the bill.","publishedAt":"2026-03-12T12:00:00.000Z","updatedAt":"2026-06-08T11:11:44.560Z","category":"Cloud"},{"title":"GitHub Actions Monorepo CI: How We Cut Build Times Without Breaking Main","url":"https://www.devopsness.com/blog/github-actions-monorepo-ci-how-we-cut-build-times-without-breaking-main-2026-03-11","description":"A practical GitHub Actions monorepo CI guide built around a real scaling problem: long queues, noisy failures, and developers waiting 40 minutes for feedback.","publishedAt":"2026-03-11T12:00:00.000Z","updatedAt":"2026-05-21T22:05:56.334Z","category":"DevOps"},{"title":"Real-World RAG Incidents: Lessons from a Production Rollout","url":"https://www.devopsness.com/blog/real-world-rag-incidents-lessons-from-a-production-rollout-46","description":"A field report from rolling out retrieval-augmented generation in production, including cache bugs, bad embeddings, and how we fixed them.","publishedAt":"2026-03-10T12:00:00.000Z","updatedAt":"2026-05-30T17:15:00.553Z","category":"AI"},{"title":"How We Stopped Terraform Drift from Surprising On-Call","url":"https://www.devopsness.com/blog/how-we-stopped-terraform-drift-from-surprising-on-call-45","description":"A real story of removing console-only changes, adding drift detection, and getting Terraform back in charge.","publishedAt":"2026-03-09T12:00:00.000Z","updatedAt":"2026-05-18T17:21:05.104Z","category":"Infrastructure"},{"title":"Systemd Tricks We Use to Keep Services Boring","url":"https://www.devopsness.com/blog/systemd-tricks-we-use-to-keep-services-boring-45","description":"Concrete systemd unit patterns that reduced flakiness: restart policies, resource limits, and structured logs.","publishedAt":"2026-03-08T12:00:00.000Z","updatedAt":"2026-05-27T15:14:46.149Z","category":"Linux"},{"title":"A Pragmatic Multi-Region Strategy for Small Teams","url":"https://www.devopsness.com/blog/a-pragmatic-multi-region-strategy-for-small-teams-45","description":"How a small team moved from single-region risk to a simple active/passive multi-region setup without doubling complexity.","publishedAt":"2026-03-07T12:00:00.000Z","updatedAt":"2026-06-07T16:50:21.784Z","category":"Cloud"},{"title":"What We Learned Running Weekly Game Days on Our CI/CD Pipeline","url":"https://www.devopsness.com/blog/what-we-learned-running-weekly-game-days-on-our-ci-cd-pipeline-45","description":"Practical game day scenarios for CI/CD: broken rollbacks, permission issues, and slow feedback loops—and how we fixed them.","publishedAt":"2026-03-06T12:00:00.000Z","updatedAt":"2026-05-18T17:21:04.426Z","category":"DevOps"},{"title":"Ansible and Infrastructure as Code: Idempotency and Best Practices","url":"https://www.devopsness.com/blog/ansible-and-infrastructure-as-code-idempotency-and-best-practices","description":"Write Ansible playbooks that are idempotent, readable, and maintainable for config management.","publishedAt":"2026-03-05T21:11:57.455Z","updatedAt":"2026-05-18T17:20:13.241Z","category":"Infrastructure"},{"title":"Real-World RAG Incidents: Lessons from a Production Rollout","url":"https://www.devopsness.com/blog/real-world-rag-incidents-lessons-from-a-production-rollout-45","description":"A field report from rolling out retrieval-augmented generation in production, including cache bugs, bad embeddings, and how we fixed them.","publishedAt":"2026-03-04T12:00:00.000Z","updatedAt":"2026-05-18T17:21:04.225Z","category":"AI"},{"title":"How We Stopped Terraform Drift from Surprising On-Call","url":"https://www.devopsness.com/blog/how-we-stopped-terraform-drift-from-surprising-on-call-44","description":"A real story of removing console-only changes, adding drift detection, and getting Terraform back in charge.","publishedAt":"2026-03-03T12:00:00.000Z","updatedAt":"2026-05-18T17:21:04.004Z","category":"Infrastructure"},{"title":"Systemd Tricks We Use to Keep Services Boring","url":"https://www.devopsness.com/blog/systemd-tricks-we-use-to-keep-services-boring-44","description":"Concrete systemd unit patterns that reduced flakiness: restart policies, resource limits, and structured logs.","publishedAt":"2026-03-02T12:00:00.000Z","updatedAt":"2026-05-18T17:21:03.786Z","category":"Linux"},{"title":"A Pragmatic Multi-Region Strategy for Small Teams","url":"https://www.devopsness.com/blog/a-pragmatic-multi-region-strategy-for-small-teams-44","description":"How a small team moved from single-region risk to a simple active/passive multi-region setup without doubling complexity.","publishedAt":"2026-03-01T12:00:00.000Z","updatedAt":"2026-05-18T17:21:03.600Z","category":"Cloud"},{"title":"End-of-Week Engineering: Why Smart Tech Teams Don’t Ship Major Changes on Friday","url":"https://www.devopsness.com/blog/end-of-week-engineering-no-friday-deployments-2026-02-28","description":"A practical risk-management framework for release timing, Friday deployment policies, progressive delivery, and how elite teams protect reliability and people.","publishedAt":"2026-02-28T12:00:00.000Z","updatedAt":"2026-06-10T11:16:35.677Z","category":"DevOps"},{"title":"Kubernetes Cost Optimization for Teams: FinOps Tactics That Actually Work","url":"https://www.devopsness.com/blog/kubernetes-finops-cost-optimization-2026-02-27","description":"Cut Kubernetes spend without hurting reliability using a practical FinOps playbook for rightsizing, autoscaling guardrails, showback, and weekly waste cleanup.","publishedAt":"2026-02-27T10:00:00.000Z","updatedAt":"2026-05-18T17:20:08.900Z","category":"Cloud"},{"title":"SRE Error Budgets in Practice: Shipping Fast Without Burning Reliability","url":"https://www.devopsness.com/blog/sre-error-budgets-practical-guide-2026-02-26","description":"A practical way to define SLOs and error budgets, connect them to release decisions, and avoid reliability debates without data.","publishedAt":"2026-02-26T10:00:00.000Z","updatedAt":"2026-06-02T14:09:36.465Z","category":"DevOps"},{"title":"Platform Engineering with Backstage: Build a Useful Developer Portal","url":"https://www.devopsness.com/blog/platform-engineering-backstage-developer-portal-2026-02-25","description":"How to implement Backstage with real templates, scorecards, and golden paths so internal platform work reduces delivery friction.","publishedAt":"2026-02-25T10:00:00.000Z","updatedAt":"2026-06-01T03:09:53.209Z","category":"Infrastructure"},{"title":"GitHub Actions for Monorepos: Fast CI Without Pipeline Chaos","url":"https://www.devopsness.com/blog/github-actions-monorepo-fast-ci-2026-02-24","description":"A practical pattern for monorepo CI with path filters, matrix builds, caching, and deployment guards that keep feedback fast as teams scale.","publishedAt":"2026-02-24T10:00:00.000Z","updatedAt":"2026-05-31T12:50:22.610Z","category":"DevOps"},{"title":"Azure DevOps Best Practices in 2026: Build Pipelines You Can Trust","url":"https://www.devopsness.com/blog/azure-devops-best-practices-2026-02-23","description":"A production-focused guide to Azure DevOps: standardized YAML templates, secure service connections, rollout safety, and measurable delivery reliability.","publishedAt":"2026-02-23T10:00:00.000Z","updatedAt":"2026-06-11T01:34:05.696Z","category":"DevOps"},{"title":"AI Best Practices in 2026: Shipping Reliable Systems, Not Demo Magic","url":"https://www.devopsness.com/blog/ai-best-practices-2026-02-22-reliable-production-systems","description":"A practical production playbook for AI systems: evaluation gates, guardrails, observability, cost control, and reliable release management.","publishedAt":"2026-02-22T09:30:00.000Z","updatedAt":"2026-05-18T17:20:07.531Z","category":"AI"},{"title":"AI Best Practices for Engineering Teams: From Prompt Experiments to Platform Discipline","url":"https://www.devopsness.com/blog/ai-best-practices-2026-02-21-platform-discipline","description":"A practical field manual for engineering teams who want AI features that survive real users, incidents, and budgets — not just demo day.","publishedAt":"2026-02-21T09:30:00.000Z","updatedAt":"2026-06-10T10:16:28.244Z","category":"AI"},{"title":"What We Learned Running Weekly Game Days on Our CI/CD Pipeline","url":"https://www.devopsness.com/blog/what-we-learned-running-weekly-game-days-on-our-ci-cd-pipeline-44","description":"Practical game day scenarios for CI/CD: broken rollbacks, permission issues, and slow feedback loops—and how we fixed them.","publishedAt":"2026-02-19T12:00:00.000Z","updatedAt":"2026-05-18T17:21:03.403Z","category":"DevOps"},{"title":"Real-World RAG Incidents: Lessons from a Production Rollout","url":"https://www.devopsness.com/blog/real-world-rag-incidents-lessons-from-a-production-rollout-44","description":"A field report from rolling out retrieval-augmented generation in production, including cache bugs, bad embeddings, and how we fixed them.","publishedAt":"2026-02-18T12:00:00.000Z","updatedAt":"2026-05-18T17:21:03.163Z","category":"AI"},{"title":"How We Stopped Terraform Drift from Surprising On-Call","url":"https://www.devopsness.com/blog/how-we-stopped-terraform-drift-from-surprising-on-call-43","description":"A real story of removing console-only changes, adding drift detection, and getting Terraform back in charge.","publishedAt":"2026-02-17T12:00:00.000Z","updatedAt":"2026-05-18T17:21:02.962Z","category":"Infrastructure"},{"title":"Systemd Tricks We Use to Keep Services Boring","url":"https://www.devopsness.com/blog/systemd-tricks-we-use-to-keep-services-boring-43","description":"Concrete systemd unit patterns that reduced flakiness: restart policies, resource limits, and structured logs.","publishedAt":"2026-02-15T12:00:00.000Z","updatedAt":"2026-05-18T17:21:02.756Z","category":"Linux"},{"title":"A Pragmatic Multi-Region Strategy for Small Teams","url":"https://www.devopsness.com/blog/a-pragmatic-multi-region-strategy-for-small-teams-43","description":"How a small team moved from single-region risk to a simple active/passive multi-region setup without doubling complexity.","publishedAt":"2026-02-14T12:00:00.000Z","updatedAt":"2026-05-18T17:21:02.497Z","category":"Cloud"},{"title":"Kubernetes Networking: Services, Ingress, and Network Policies","url":"https://www.devopsness.com/blog/kubernetes-networking-services-ingress-and-network-policies","description":"Understand Kubernetes networking: ClusterIP, NodePort, LoadBalancer, Ingress, and policy.","publishedAt":"2026-02-13T07:21:17.596Z","updatedAt":"2026-05-18T17:20:12.945Z","category":"DevOps"},{"title":"What We Learned Running Weekly Game Days on Our CI/CD Pipeline","url":"https://www.devopsness.com/blog/what-we-learned-running-weekly-game-days-on-our-ci-cd-pipeline-43","description":"Practical game day scenarios for CI/CD: broken rollbacks, permission issues, and slow feedback loops—and how we fixed them.","publishedAt":"2026-02-11T12:00:00.000Z","updatedAt":"2026-05-18T17:21:02.291Z","category":"DevOps"},{"title":"Real-World RAG Incidents: Lessons from a Production Rollout","url":"https://www.devopsness.com/blog/real-world-rag-incidents-lessons-from-a-production-rollout-43","description":"A field report from rolling out retrieval-augmented generation in production, including cache bugs, bad embeddings, and how we fixed them.","publishedAt":"2026-02-10T12:00:00.000Z","updatedAt":"2026-05-18T17:21:01.927Z","category":"AI"},{"title":"How We Stopped Terraform Drift from Surprising On-Call","url":"https://www.devopsness.com/blog/how-we-stopped-terraform-drift-from-surprising-on-call-42","description":"A real story of removing console-only changes, adding drift detection, and getting Terraform back in charge.","publishedAt":"2026-02-09T12:00:00.000Z","updatedAt":"2026-05-18T17:21:01.723Z","category":"Infrastructure"},{"title":"Systemd Tricks We Use to Keep Services Boring","url":"https://www.devopsness.com/blog/systemd-tricks-we-use-to-keep-services-boring-42","description":"Concrete systemd unit patterns that reduced flakiness: restart policies, resource limits, and structured logs.","publishedAt":"2026-02-07T12:00:00.000Z","updatedAt":"2026-06-01T08:29:45.358Z","category":"Linux"},{"title":"A Pragmatic Multi-Region Strategy for Small Teams","url":"https://www.devopsness.com/blog/a-pragmatic-multi-region-strategy-for-small-teams-42","description":"How a small team moved from single-region risk to a simple active/passive multi-region setup without doubling complexity.","publishedAt":"2026-02-06T12:00:00.000Z","updatedAt":"2026-05-18T17:21:01.303Z","category":"Cloud"},{"title":"Infrastructure Cost Optimization: Reducing Cloud Spending","url":"https://www.devopsness.com/blog/infrastructure-cost-optimization-reducing-cloud-spending","description":"We cut our AWS bill by 38% in a quarter. The specific changes that moved the bill, ranked by impact, with what we'd do first.","publishedAt":"2026-02-05T16:17:55.440Z","updatedAt":"2026-06-06T20:04:20.868Z","category":"Infrastructure"},{"title":"What We Learned Running Weekly Game Days on Our CI/CD Pipeline","url":"https://www.devopsness.com/blog/what-we-learned-running-weekly-game-days-on-our-ci-cd-pipeline-42","description":"Practical game day scenarios for CI/CD: broken rollbacks, permission issues, and slow feedback loops—and how we fixed them.","publishedAt":"2026-02-03T12:00:00.000Z","updatedAt":"2026-05-18T17:21:01.105Z","category":"DevOps"},{"title":"Real-World RAG Incidents: Lessons from a Production Rollout","url":"https://www.devopsness.com/blog/real-world-rag-incidents-lessons-from-a-production-rollout-42","description":"A field report from rolling out retrieval-augmented generation in production, including cache bugs, bad embeddings, and how we fixed them.","publishedAt":"2026-02-02T12:00:00.000Z","updatedAt":"2026-05-18T17:21:00.860Z","category":"AI"},{"title":"Multi-Cloud Infrastructure: Managing Resources Across Providers","url":"https://www.devopsness.com/blog/multi-cloud-infrastructure-managing-resources-across-providers","description":"We run mostly on AWS but use GCP for specific workloads. The honest cost-benefit analysis of multi-cloud, plus the patterns that make it not awful.","publishedAt":"2026-02-01T16:17:55.440Z","updatedAt":"2026-05-19T01:10:59.136Z","category":"Infrastructure"},{"title":"How We Stopped Terraform Drift from Surprising On-Call","url":"https://www.devopsness.com/blog/how-we-stopped-terraform-drift-from-surprising-on-call-41","description":"A real story of removing console-only changes, adding drift detection, and getting Terraform back in charge.","publishedAt":"2026-01-31T12:00:00.000Z","updatedAt":"2026-05-18T17:21:00.661Z","category":"Infrastructure"},{"title":"Systemd Tricks We Use to Keep Services Boring","url":"https://www.devopsness.com/blog/systemd-tricks-we-use-to-keep-services-boring-41","description":"Concrete systemd unit patterns that reduced flakiness: restart policies, resource limits, and structured logs.","publishedAt":"2026-01-30T12:00:00.000Z","updatedAt":"2026-05-18T17:21:00.459Z","category":"Linux"},{"title":"Disaster Recovery Planning: Building Resilient Infrastructure","url":"https://www.devopsness.com/blog/disaster-recovery-planning-building-resilient-infrastructure","description":"A different angle on DR: the planning process — RTO/RPO conversations, dependency mapping, and what we learned about prioritizing what to recover.","publishedAt":"2026-01-29T16:17:55.440Z","updatedAt":"2026-05-18T17:19:59.224Z","category":"Infrastructure"},{"title":"A Pragmatic Multi-Region Strategy for Small Teams","url":"https://www.devopsness.com/blog/a-pragmatic-multi-region-strategy-for-small-teams-41","description":"How a small team moved from single-region risk to a simple active/passive multi-region setup without doubling complexity.","publishedAt":"2026-01-27T12:00:00.000Z","updatedAt":"2026-05-18T17:21:00.259Z","category":"Cloud"},{"title":"What We Learned Running Weekly Game Days on Our CI/CD Pipeline","url":"https://www.devopsness.com/blog/what-we-learned-running-weekly-game-days-on-our-ci-cd-pipeline-41","description":"Practical game day scenarios for CI/CD: broken rollbacks, permission issues, and slow feedback loops—and how we fixed them.","publishedAt":"2026-01-26T12:00:00.000Z","updatedAt":"2026-05-18T17:21:00.056Z","category":"DevOps"},{"title":"Infrastructure Monitoring: Observability for IaC","url":"https://www.devopsness.com/blog/infrastructure-monitoring-observability-iac","description":"Defining monitoring as code: dashboards, alerts, and SLOs in Git. The patterns that survived the migration from clicked-together monitoring.","publishedAt":"2026-01-25T16:17:55.440Z","updatedAt":"2026-05-18T17:19:58.968Z","category":"Infrastructure"},{"title":"FinOps and Cloud Cost Management for Engineering Teams","url":"https://www.devopsness.com/blog/finops-and-cloud-cost-management-for-engineering-teams","description":"Embed cost ownership in engineering: tags, budgets, and showback.","publishedAt":"2026-01-23T17:30:37.737Z","updatedAt":"2026-05-18T17:20:12.752Z","category":"Cloud"},{"title":"Ansible Playbook Optimization: Writing Efficient Playbooks","url":"https://www.devopsness.com/blog/ansible-playbook-optimization-writing-efficient-playbooks","description":"We cut our largest playbook's runtime from 14 minutes to 4 minutes. The specific changes that mattered, plus the ones that didn't.","publishedAt":"2026-01-22T16:17:55.440Z","updatedAt":"2026-05-22T09:31:43.476Z","category":"Infrastructure"},{"title":"Real-World RAG Incidents: Lessons from a Production Rollout","url":"https://www.devopsness.com/blog/real-world-rag-incidents-lessons-from-a-production-rollout-41","description":"A field report from rolling out retrieval-augmented generation in production, including cache bugs, bad embeddings, and how we fixed them.","publishedAt":"2026-01-21T12:00:00.000Z","updatedAt":"2026-05-18T17:20:59.868Z","category":"AI"},{"title":"How We Stopped Terraform Drift from Surprising On-Call","url":"https://www.devopsness.com/blog/how-we-stopped-terraform-drift-from-surprising-on-call-40","description":"A real story of removing console-only changes, adding drift detection, and getting Terraform back in charge.","publishedAt":"2026-01-19T12:00:00.000Z","updatedAt":"2026-05-18T17:20:59.667Z","category":"Infrastructure"},{"title":"Pulumi vs Terraform Deep Dive: Choosing the Right IaC Tool","url":"https://www.devopsness.com/blog/pulumi-vs-terraform-deep-dive-choosing-right-iac-tool","description":"We tried Pulumi for a quarter and went back to Terraform. Both are real options. Why we picked one and what would change our mind.","publishedAt":"2026-01-18T16:17:55.440Z","updatedAt":"2026-05-18T17:19:58.358Z","category":"Infrastructure"},{"title":"Systemd Tricks We Use to Keep Services Boring","url":"https://www.devopsness.com/blog/systemd-tricks-we-use-to-keep-services-boring-40","description":"Concrete systemd unit patterns that reduced flakiness: restart policies, resource limits, and structured logs.","publishedAt":"2026-01-17T12:00:00.000Z","updatedAt":"2026-05-18T17:20:59.475Z","category":"Linux"},{"title":"A Pragmatic Multi-Region Strategy for Small Teams","url":"https://www.devopsness.com/blog/a-pragmatic-multi-region-strategy-for-small-teams-40","description":"How a small team moved from single-region risk to a simple active/passive multi-region setup without doubling complexity.","publishedAt":"2026-01-16T12:00:00.000Z","updatedAt":"2026-05-18T17:20:59.288Z","category":"Cloud"},{"title":"Operational Checklist: Kubernetes Secrets and External Vault Integration","url":"https://www.devopsness.com/blog/operational-checklist-kubernetes-secrets-and-external-vault-integration","description":"K8s Secrets are barely encrypted. We moved every secret to Vault with the Vault Agent injector and never went back. The setup checklist.","publishedAt":"2026-01-15T15:10:00.000Z","updatedAt":"2026-06-08T18:55:29.118Z","category":"DevOps"},{"title":"Infrastructure Testing Strategies: Validating Your IaC","url":"https://www.devopsness.com/blog/infrastructure-testing-strategies-validating-iac","description":"We test infrastructure code with three layers: validation, plan review, and integration tests. The setup that catches real bugs without slowing down PRs.","publishedAt":"2026-01-14T16:17:55.440Z","updatedAt":"2026-05-26T14:31:13.346Z","category":"Infrastructure"},{"title":"What We Learned Running Weekly Game Days on Our CI/CD Pipeline","url":"https://www.devopsness.com/blog/what-we-learned-running-weekly-game-days-on-our-ci-cd-pipeline-40","description":"Practical game day scenarios for CI/CD: broken rollbacks, permission issues, and slow feedback loops—and how we fixed them.","publishedAt":"2026-01-13T12:00:00.000Z","updatedAt":"2026-05-18T17:20:59.098Z","category":"DevOps"},{"title":"Terraform Modules Best Practices: Building Reusable Infrastructure","url":"https://www.devopsness.com/blog/terraform-modules-best-practices-building-reusable-infrastructure","description":"We have a private module registry with ~25 modules used across 12 accounts. Versioning, interface design, and the over-modularization mistake we keep making.","publishedAt":"2026-01-11T16:17:55.440Z","updatedAt":"2026-06-08T09:10:18.179Z","category":"Infrastructure"},{"title":"Real-World RAG Incidents: Lessons from a Production Rollout","url":"https://www.devopsness.com/blog/real-world-rag-incidents-lessons-from-a-production-rollout-40","description":"A field report from rolling out retrieval-augmented generation in production, including cache bugs, bad embeddings, and how we fixed them.","publishedAt":"2026-01-10T12:00:00.000Z","updatedAt":"2026-05-27T20:41:59.643Z","category":"AI"},{"title":"How We Stopped Terraform Drift from Surprising On-Call","url":"https://www.devopsness.com/blog/how-we-stopped-terraform-drift-from-surprising-on-call-39","description":"A real story of removing console-only changes, adding drift detection, and getting Terraform back in charge.","publishedAt":"2026-01-09T12:00:00.000Z","updatedAt":"2026-05-18T17:20:58.697Z","category":"Infrastructure"},{"title":"Linux Container Internals: Understanding How Containers Work","url":"https://www.devopsness.com/blog/linux-container-internals-understanding-how-containers-work","description":"A container is a process with extra kernel features applied. Walking through namespaces, cgroups, and the actual mechanics — the level of detail that makes \"container weirdness\" debuggable.","publishedAt":"2026-01-07T16:17:55.440Z","updatedAt":"2026-05-28T09:44:36.443Z","category":"Linux"},{"title":"Systemd Tricks We Use to Keep Services Boring","url":"https://www.devopsness.com/blog/systemd-tricks-we-use-to-keep-services-boring-39","description":"Concrete systemd unit patterns that reduced flakiness: restart policies, resource limits, and structured logs.","publishedAt":"2026-01-06T12:00:00.000Z","updatedAt":"2026-05-18T17:20:58.471Z","category":"Linux"},{"title":"A Pragmatic Multi-Region Strategy for Small Teams","url":"https://www.devopsness.com/blog/a-pragmatic-multi-region-strategy-for-small-teams-39","description":"How a small team moved from single-region risk to a simple active/passive multi-region setup without doubling complexity.","publishedAt":"2026-01-05T12:00:00.000Z","updatedAt":"2026-06-07T23:10:57.088Z","category":"Cloud"},{"title":"Shell Scripting Best Practices: Writing Maintainable Scripts","url":"https://www.devopsness.com/blog/shell-scripting-best-practices-writing-maintainable-scripts","description":"We have a few hundred shell scripts in production. The patterns that make them survive contact with reality, and the ones we've stopped writing.","publishedAt":"2026-01-04T16:17:55.440Z","updatedAt":"2026-05-20T07:27:25.886Z","category":"Linux"},{"title":"Prompt Engineering for DevOps: Consistency and Safety","url":"https://www.devopsness.com/blog/prompt-engineering-for-devops-consistency-and-safety","description":"Use prompts to get reliable, safe outputs from LLMs for runbooks, code, and ops tasks.","publishedAt":"2026-01-03T03:39:57.879Z","updatedAt":"2026-06-07T21:21:57.511Z","category":"AI"},{"title":"What We Learned Running Weekly Game Days on Our CI/CD Pipeline","url":"https://www.devopsness.com/blog/what-we-learned-running-weekly-game-days-on-our-ci-cd-pipeline-39","description":"Practical game day scenarios for CI/CD: broken rollbacks, permission issues, and slow feedback loops—and how we fixed them.","publishedAt":"2026-01-02T12:00:00.000Z","updatedAt":"2026-05-19T12:21:49.156Z","category":"DevOps"},{"title":"Real-World RAG Incidents: Lessons from a Production Rollout","url":"https://www.devopsness.com/blog/real-world-rag-incidents-lessons-from-a-production-rollout-39","description":"A field report from rolling out retrieval-augmented generation in production, including cache bugs, bad embeddings, and how we fixed them.","publishedAt":"2026-01-01T12:00:00.000Z","updatedAt":"2026-05-18T17:20:57.709Z","category":"AI"},{"title":"File System Optimization: Improving Disk Performance","url":"https://www.devopsness.com/blog/file-system-optimization-improving-disk-performance","description":"Filesystem choice, mount options, IO schedulers — the per-host tweaks that actually moved disk performance for our database and storage workloads.","publishedAt":"2025-12-31T16:17:55.440Z","updatedAt":"2026-05-20T04:30:46.448Z","category":"Linux"},{"title":"How We Stopped Terraform Drift from Surprising On-Call","url":"https://www.devopsness.com/blog/how-we-stopped-terraform-drift-from-surprising-on-call-38","description":"A real story of removing console-only changes, adding drift detection, and getting Terraform back in charge.","publishedAt":"2025-12-30T12:00:00.000Z","updatedAt":"2026-05-18T17:20:57.492Z","category":"Infrastructure"},{"title":"Systemd Tricks We Use to Keep Services Boring","url":"https://www.devopsness.com/blog/systemd-tricks-we-use-to-keep-services-boring-38","description":"Concrete systemd unit patterns that reduced flakiness: restart policies, resource limits, and structured logs.","publishedAt":"2025-12-29T12:00:00.000Z","updatedAt":"2026-05-31T05:16:05.692Z","category":"Linux"},{"title":"A Pragmatic Multi-Region Strategy for Small Teams","url":"https://www.devopsness.com/blog/a-pragmatic-multi-region-strategy-for-small-teams-38","description":"How a small team moved from single-region risk to a simple active/passive multi-region setup without doubling complexity.","publishedAt":"2025-12-28T12:00:00.000Z","updatedAt":"2026-06-10T17:27:12.241Z","category":"Cloud"},{"title":"Process Management and Monitoring in Linux","url":"https://www.devopsness.com/blog/process-management-monitoring-linux","description":"How processes actually live and die on Linux, the tools that show what's happening, and the patterns we use for monitoring service health.","publishedAt":"2025-12-27T16:17:55.440Z","updatedAt":"2026-06-06T10:29:12.093Z","category":"Linux"},{"title":"What We Learned Running Weekly Game Days on Our CI/CD Pipeline","url":"https://www.devopsness.com/blog/what-we-learned-running-weekly-game-days-on-our-ci-cd-pipeline-38","description":"Practical game day scenarios for CI/CD: broken rollbacks, permission issues, and slow feedback loops—and how we fixed them.","publishedAt":"2025-12-26T12:00:00.000Z","updatedAt":"2026-06-01T06:56:53.304Z","category":"DevOps"},{"title":"Real-World RAG Incidents: Lessons from a Production Rollout","url":"https://www.devopsness.com/blog/real-world-rag-incidents-lessons-from-a-production-rollout-38","description":"A field report from rolling out retrieval-augmented generation in production, including cache bugs, bad embeddings, and how we fixed them.","publishedAt":"2025-12-25T12:00:00.000Z","updatedAt":"2026-05-18T17:20:56.655Z","category":"AI"},{"title":"Linux Security Hardening: Protecting Your System","url":"https://www.devopsness.com/blog/linux-security-hardening-protecting-system","description":"A practical Linux hardening checklist for production hosts. The settings that earn their place via real production reasons, not the cargo-cult version.","publishedAt":"2025-12-24T16:17:55.440Z","updatedAt":"2026-06-11T08:06:19.701Z","category":"Linux"},{"title":"How We Stopped Terraform Drift from Surprising On-Call","url":"https://www.devopsness.com/blog/how-we-stopped-terraform-drift-from-surprising-on-call-37","description":"A real story of removing console-only changes, adding drift detection, and getting Terraform back in charge.","publishedAt":"2025-12-23T12:00:00.000Z","updatedAt":"2026-06-01T05:21:13.509Z","category":"Infrastructure"},{"title":"Systemd Tricks We Use to Keep Services Boring","url":"https://www.devopsness.com/blog/systemd-tricks-we-use-to-keep-services-boring-37","description":"Concrete systemd unit patterns that reduced flakiness: restart policies, resource limits, and structured logs.","publishedAt":"2025-12-22T12:00:00.000Z","updatedAt":"2026-05-30T15:47:57.763Z","category":"Linux"},{"title":"A Pragmatic Multi-Region Strategy for Small Teams","url":"https://www.devopsness.com/blog/a-pragmatic-multi-region-strategy-for-small-teams-37","description":"How a small team moved from single-region risk to a simple active/passive multi-region setup without doubling complexity.","publishedAt":"2025-12-21T12:00:00.000Z","updatedAt":"2026-05-18T17:20:55.650Z","category":"Cloud"},{"title":"Operational Checklist: Systemd Service Reliability Patterns","url":"https://www.devopsness.com/blog/operational-checklist-systemd-service-reliability-patterns","description":"A condensed checklist of the systemd unit-file patterns we now use everywhere, with the production reasons each one matters.","publishedAt":"2025-12-20T16:21:00.000Z","updatedAt":"2026-06-11T11:00:29.547Z","category":"Linux"},{"title":"Network Configuration and Troubleshooting in Linux","url":"https://www.devopsness.com/blog/network-configuration-troubleshooting-linux","description":"A systematic approach to debugging Linux network issues. The tools that earn their place and the order I use them in.","publishedAt":"2025-12-20T16:17:55.440Z","updatedAt":"2026-05-27T16:37:21.074Z","category":"Linux"},{"title":"What We Learned Running Weekly Game Days on Our CI/CD Pipeline","url":"https://www.devopsness.com/blog/what-we-learned-running-weekly-game-days-on-our-ci-cd-pipeline-37","description":"Practical game day scenarios for CI/CD: broken rollbacks, permission issues, and slow feedback loops—and how we fixed them.","publishedAt":"2025-12-19T12:00:00.000Z","updatedAt":"2026-05-18T17:20:55.447Z","category":"DevOps"},{"title":"Real-World RAG Incidents: Lessons from a Production Rollout","url":"https://www.devopsness.com/blog/real-world-rag-incidents-lessons-from-a-production-rollout-37","description":"A field report from rolling out retrieval-augmented generation in production, including cache bugs, bad embeddings, and how we fixed them.","publishedAt":"2025-12-18T12:00:00.000Z","updatedAt":"2026-05-18T17:20:55.237Z","category":"AI"},{"title":"Linux Performance Tuning: Optimizing System Performance","url":"https://www.devopsness.com/blog/linux-performance-tuning-optimizing-system-performance","description":"A practical Linux performance tuning playbook for production servers. The kernel parameters, disk and network tweaks that earn their place, and the ones that turned out to be folklore.","publishedAt":"2025-12-17T16:17:55.440Z","updatedAt":"2026-06-03T22:10:17.020Z","category":"Linux"},{"title":"How We Stopped Terraform Drift from Surprising On-Call","url":"https://www.devopsness.com/blog/how-we-stopped-terraform-drift-from-surprising-on-call-36","description":"A real story of removing console-only changes, adding drift detection, and getting Terraform back in charge.","publishedAt":"2025-12-15T12:00:00.000Z","updatedAt":"2026-05-18T17:20:55.027Z","category":"Infrastructure"},{"title":"Systemd Tricks We Use to Keep Services Boring","url":"https://www.devopsness.com/blog/systemd-tricks-we-use-to-keep-services-boring-36","description":"Concrete systemd unit patterns that reduced flakiness: restart policies, resource limits, and structured logs.","publishedAt":"2025-12-14T12:00:00.000Z","updatedAt":"2026-05-18T17:20:54.830Z","category":"Linux"},{"title":"Systemd Service Management: Creating and Managing Services","url":"https://www.devopsness.com/blog/systemd-service-management-creating-managing-services","description":"A practical guide to writing and managing systemd services for production. The unit file features that earn their place, plus the operational workflows.","publishedAt":"2025-12-13T16:17:55.440Z","updatedAt":"2026-05-26T21:35:01.542Z","category":"Linux"},{"title":"Systemd and Modern Linux Service Management","url":"https://www.devopsness.com/blog/systemd-and-modern-linux-service-management","description":"Run services reliably with systemd: units, dependencies, and resource limits.","publishedAt":"2025-12-13T13:49:18.020Z","updatedAt":"2026-06-03T09:59:34.919Z","category":"Linux"},{"title":"A Pragmatic Multi-Region Strategy for Small Teams","url":"https://www.devopsness.com/blog/a-pragmatic-multi-region-strategy-for-small-teams-36","description":"How a small team moved from single-region risk to a simple active/passive multi-region setup without doubling complexity.","publishedAt":"2025-12-12T12:00:00.000Z","updatedAt":"2026-05-31T07:20:36.898Z","category":"Cloud"},{"title":"What We Learned Running Weekly Game Days on Our CI/CD Pipeline","url":"https://www.devopsness.com/blog/what-we-learned-running-weekly-game-days-on-our-ci-cd-pipeline-36","description":"Practical game day scenarios for CI/CD: broken rollbacks, permission issues, and slow feedback loops—and how we fixed them.","publishedAt":"2025-12-10T12:00:00.000Z","updatedAt":"2026-05-28T18:01:32.115Z","category":"DevOps"},{"title":"Edge Computing with AWS: CloudFront and Lambda@Edge","url":"https://www.devopsness.com/blog/edge-computing-aws-cloudfront-lambda-edge","description":"We use CloudFront + Lambda@Edge for specific patterns. The wins, the production gotchas, and where we hit Lambda@Edge's limits.","publishedAt":"2025-12-09T16:17:55.440Z","updatedAt":"2026-06-08T04:51:28.822Z","category":"Cloud"},{"title":"Real-World RAG Incidents: Lessons from a Production Rollout","url":"https://www.devopsness.com/blog/real-world-rag-incidents-lessons-from-a-production-rollout-36","description":"A field report from rolling out retrieval-augmented generation in production, including cache bugs, bad embeddings, and how we fixed them.","publishedAt":"2025-12-08T12:00:00.000Z","updatedAt":"2026-05-18T17:20:54.209Z","category":"AI"},{"title":"Cloud-Native Databases: Choosing the Right Database for Your Workload","url":"https://www.devopsness.com/blog/cloud-native-databases-choosing-right-database-workload","description":"Postgres, DynamoDB, Redis, Elasticsearch, Snowflake. We use all five for different workloads. The decision criteria, not the marketing comparison.","publishedAt":"2025-12-06T16:17:55.440Z","updatedAt":"2026-05-27T14:37:31.033Z","category":"Cloud"},{"title":"How We Stopped Terraform Drift from Surprising On-Call","url":"https://www.devopsness.com/blog/how-we-stopped-terraform-drift-from-surprising-on-call-35","description":"A real story of removing console-only changes, adding drift detection, and getting Terraform back in charge.","publishedAt":"2025-12-05T12:00:00.000Z","updatedAt":"2026-05-28T04:15:41.265Z","category":"Infrastructure"},{"title":"Systemd Tricks We Use to Keep Services Boring","url":"https://www.devopsness.com/blog/systemd-tricks-we-use-to-keep-services-boring-35","description":"Concrete systemd unit patterns that reduced flakiness: restart policies, resource limits, and structured logs.","publishedAt":"2025-12-03T12:00:00.000Z","updatedAt":"2026-05-18T17:20:53.811Z","category":"Linux"},{"title":"Disaster Recovery in the Cloud: Backup and Recovery Strategies","url":"https://www.devopsness.com/blog/disaster-recovery-cloud-backup-recovery-strategies","description":"We've executed real disaster recoveries twice. The plan that survived contact with reality, and what was wrong about the plans we had before that.","publishedAt":"2025-12-02T16:17:55.440Z","updatedAt":"2026-05-18T17:19:55.087Z","category":"Cloud"},{"title":"A Pragmatic Multi-Region Strategy for Small Teams","url":"https://www.devopsness.com/blog/a-pragmatic-multi-region-strategy-for-small-teams-35","description":"How a small team moved from single-region risk to a simple active/passive multi-region setup without doubling complexity.","publishedAt":"2025-12-01T12:00:00.000Z","updatedAt":"2026-05-18T17:20:53.596Z","category":"Cloud"},{"title":"Cloud Networking Fundamentals: VPCs, Subnets, and Routing","url":"https://www.devopsness.com/blog/cloud-networking-fundamentals-vpcs-subnets-routing","description":"VPCs, subnets, route tables, gateways. The mental model that finally made cloud networking click after I stopped trying to map it 1:1 to physical networks.","publishedAt":"2025-11-29T16:17:55.440Z","updatedAt":"2026-05-18T17:19:54.831Z","category":"Cloud"},{"title":"What We Learned Running Weekly Game Days on Our CI/CD Pipeline","url":"https://www.devopsness.com/blog/what-we-learned-running-weekly-game-days-on-our-ci-cd-pipeline-35","description":"Practical game day scenarios for CI/CD: broken rollbacks, permission issues, and slow feedback loops—and how we fixed them.","publishedAt":"2025-11-28T12:00:00.000Z","updatedAt":"2026-05-27T10:17:36.211Z","category":"DevOps"},{"title":"Real-World RAG Incidents: Lessons from a Production Rollout","url":"https://www.devopsness.com/blog/real-world-rag-incidents-lessons-from-a-production-rollout-35","description":"A field report from rolling out retrieval-augmented generation in production, including cache bugs, bad embeddings, and how we fixed them.","publishedAt":"2025-11-27T12:00:00.000Z","updatedAt":"2026-05-18T17:20:53.193Z","category":"AI"},{"title":"AWS ECS vs EKS: Choosing the Right Container Platform","url":"https://www.devopsness.com/blog/aws-ecs-vs-eks-choosing-right-container-platform","description":"We run both ECS and EKS in production. Which we use for what, and the actual decision criteria — not the marketing comparison.","publishedAt":"2025-11-25T16:17:55.440Z","updatedAt":"2026-05-28T12:12:33.497Z","category":"Cloud"},{"title":"How We Stopped Terraform Drift from Surprising On-Call","url":"https://www.devopsness.com/blog/how-we-stopped-terraform-drift-from-surprising-on-call-34","description":"A real story of removing console-only changes, adding drift detection, and getting Terraform back in charge.","publishedAt":"2025-11-24T12:00:00.000Z","updatedAt":"2026-05-18T17:20:52.976Z","category":"Infrastructure"},{"title":"Systemd Tricks We Use to Keep Services Boring","url":"https://www.devopsness.com/blog/systemd-tricks-we-use-to-keep-services-boring-34","description":"Concrete systemd unit patterns that reduced flakiness: restart policies, resource limits, and structured logs.","publishedAt":"2025-11-23T12:00:00.000Z","updatedAt":"2026-05-18T17:20:52.772Z","category":"Linux"},{"title":"Container Image Scanning in CI and at Runtime","url":"https://www.devopsness.com/blog/container-image-scanning-in-ci-and-at-runtime","description":"Shift-left security with image scanning. Trivy, policy gates, and runtime integration.","publishedAt":"2025-11-22T23:58:38.161Z","updatedAt":"2026-06-08T03:59:35.921Z","category":"DevOps"},{"title":"Cloud Security Best Practices: Securing Your AWS Infrastructure","url":"https://www.devopsness.com/blog/cloud-security-best-practices-securing-aws-infrastructure","description":"A working AWS security baseline, derived from the actual incidents we've had and the audit findings we've cleared.","publishedAt":"2025-11-21T16:17:55.440Z","updatedAt":"2026-06-08T02:57:25.408Z","category":"Cloud"},{"title":"A Pragmatic Multi-Region Strategy for Small Teams","url":"https://www.devopsness.com/blog/a-pragmatic-multi-region-strategy-for-small-teams-34","description":"How a small team moved from single-region risk to a simple active/passive multi-region setup without doubling complexity.","publishedAt":"2025-11-20T12:00:00.000Z","updatedAt":"2026-06-06T20:19:10.225Z","category":"Cloud"}]}