Concrete systemd unit patterns that reduced flakiness: restart policies, resource limits, and structured logs.
After a few painful outages caused by homemade init scripts, we moved everything to systemd and wrote down the patterns that worked.
We had a service that occasionally failed to bind its port on boot.
```ini [Unit] Description=API service After=network-online.target Wants=network-online.target
[Service] ExecStart=/usr/local/bin/api Restart=on-failure RestartSec=5
[Install] WantedBy=multi-user.target ```
We saw file descriptor exhaustion during load tests.
When something goes wrong, we start with:
Systemd didn’t fix our code, but it made failures predictable and repeatable.
Get the latest tutorials, guides, and insights on AI, DevOps, Cloud, and Infrastructure delivered directly to your inbox.
How a small team moved from single-region risk to a simple active/passive multi-region setup without doubling complexity.
We tried Pulumi for a quarter and went back to Terraform. Both are real options. Why we picked one and what would change our mind.
Explore more articles in this category
We started using eBPF tooling for ad-hoc production debugging six months ago. Three real incidents where it cut investigation time from hours to minutes.
Three production OOM incidents that taught us how kubelet, containerd, and the kernel actually decide which process dies. With debugging commands you'll wish you had earlier.
We migrated 47 cron jobs to systemd timers across our fleet. The mechanical conversion was easy. The interesting parts were the bugs we found that cron had been hiding.