Concrete systemd unit patterns that reduced flakiness: restart policies, resource limits, and structured logs.
After a few painful outages caused by homemade init scripts, we moved everything to systemd and wrote down the patterns that worked.
We had a service that occasionally failed to bind its port on boot.
```ini [Unit] Description=API service After=network-online.target Wants=network-online.target
[Service] ExecStart=/usr/local/bin/api Restart=on-failure RestartSec=5
[Install] WantedBy=multi-user.target ```
We saw file descriptor exhaustion during load tests.
When something goes wrong, we start with:
Systemd didn’t fix our code, but it made failures predictable and repeatable.
Get the latest tutorials, guides, and insights on AI, DevOps, Cloud, and Infrastructure delivered directly to your inbox.
A different angle on DR: the planning process — RTO/RPO conversations, dependency mapping, and what we learned about prioritizing what to recover.
A real story of removing console-only changes, adding drift detection, and getting Terraform back in charge.
Explore more articles in this category
io_uring replaces epoll for new high-throughput services. The patterns that earn their place, the gotchas in older kernels, and where we'd still pick epoll.
cpu.shares vs cpu.cfs_quota_us vs memory.max — the cgroup mechanics behind Kubernetes resource limits, and the surprises that explain the weird symptoms you've seen.
bpftrace one-liners replace strace, perf top, and a half-dozen ad-hoc debugging scripts. The patterns that actually earn their place when you're troubleshooting at 2 AM.