Concrete systemd unit patterns that reduced flakiness: restart policies, resource limits, and structured logs.
After a few painful outages caused by homemade init scripts, we moved everything to systemd and wrote down the patterns that worked.
We had a service that occasionally failed to bind its port on boot.
```ini [Unit] Description=API service After=network-online.target Wants=network-online.target
[Service] ExecStart=/usr/local/bin/api Restart=on-failure RestartSec=5
[Install] WantedBy=multi-user.target ```
We saw file descriptor exhaustion during load tests.
When something goes wrong, we start with:
Systemd didn’t fix our code, but it made failures predictable and repeatable.
Get the latest tutorials, guides, and insights on AI, DevOps, Cloud, and Infrastructure delivered directly to your inbox.
Tune the host OS for container workloads: kernel params, I/O, and cgroups.
A flat VPC is fine until you need to prove who can reach what. Five segmentation patterns that work in AWS without requiring a service mesh.
Explore more articles in this category
We started using eBPF tooling for ad-hoc production debugging six months ago. Three real incidents where it cut investigation time from hours to minutes.
Three production OOM incidents that taught us how kubelet, containerd, and the kernel actually decide which process dies. With debugging commands you'll wish you had earlier.
We migrated 47 cron jobs to systemd timers across our fleet. The mechanical conversion was easy. The interesting parts were the bugs we found that cron had been hiding.