Practical articles on AI, DevOps, Cloud, Linux, and infrastructure engineering.
Learn how to fine-tune LLMs like Llama 2, Mistral, and GPT models for your specific use case. Includes LoRA, QLoRA, and full fine-tuning techniques.
Compare Terraform, Pulumi, and Ansible for Infrastructure as Code. Learn when to use each tool and how they complement each other in modern DevOps workflows.
Set up comprehensive Linux system monitoring using Prometheus and Grafana. Monitor CPU, memory, disk, network, and application metrics with beautiful dashboards.
Discover proven strategies to reduce AWS costs by up to 50%. Learn about Reserved Instances, Spot Instances, right-sizing, and automated cost management.
We deploy LangChain apps in Docker on Kubernetes. The patterns that work, the LangChain-specific gotchas, and what we'd build differently next time.
When everything seems "slow," a baseline gives you something to measure against. The capture-and-compare workflow we use on every Linux host.
HPA, VPA, and Cluster Autoscaler / Karpenter solve overlapping problems badly when you don't understand which one owns what. The mental model that keeps them from fighting.
We run a fleet of LLM agents on Kubernetes. They're stateful, bursty, and expensive — none of which K8s defaults are good at. Here's what we changed.
We replaced three kernel-level monitoring tools with a small set of eBPF programs. What it bought us, what it cost, and where we still use the old stuff.
We removed the corporate VPN, set up workload identity everywhere, and made every service prove who it is on every call. The actual implementation, with what worked and what we abandoned.
How we organize Terraform state across 12 AWS accounts and 40+ services. Backends, locking, partitioning, and the migration we got wrong twice.
We run ~600 GitHub Actions workflow runs per day across 80 repos. The patterns that scale and the ones that hit limits we didn't expect.