Practical articles on AI, DevOps, Cloud, Linux, and infrastructure engineering.
A field report from rolling out retrieval-augmented generation in production, including cache bugs, bad embeddings, and how we fixed them.
How AI agents are moving from read-only copilots to autonomous automation with guardrails. Best practices for approval gates and rollback.
I spent 3 weeks chasing an answer-quality regression that turned out to be a tokenizer mismatch in a library upgrade. Here's what I learned about evaluating RAG.
We changed a system prompt for what we thought was a tone improvement and broke a customer-critical extraction overnight. The version control and regression tests we built next.
Learn how to fine-tune LLMs like Llama 2, Mistral, and GPT models for your specific use case. Includes LoRA, QLoRA, and full fine-tuning techniques.
We run a fleet of LLM agents on Kubernetes. They're stateful, bursty, and expensive — none of which K8s defaults are good at. Here's what we changed.
I fine-tuned Llama 3 8B on a single 4090 over a weekend for a side project. Here's what worked, what cost more than expected, and what I'd do differently.