Concrete systemd unit patterns that reduced flakiness: restart policies, resource limits, and structured logs.
After a few painful outages caused by homemade init scripts, we moved everything to systemd and wrote down the patterns that worked.
We had a service that occasionally failed to bind its port on boot.
```ini [Unit] Description=API service After=network-online.target Wants=network-online.target
[Service] ExecStart=/usr/local/bin/api Restart=on-failure RestartSec=5
[Install] WantedBy=multi-user.target ```
We saw file descriptor exhaustion during load tests.
When something goes wrong, we start with:
Systemd didn’t fix our code, but it made failures predictable and repeatable.
Get the latest tutorials, guides, and insights on AI, DevOps, Cloud, and Infrastructure delivered directly to your inbox.
A different angle on DR: the planning process — RTO/RPO conversations, dependency mapping, and what we learned about prioritizing what to recover.
A real story of removing console-only changes, adding drift detection, and getting Terraform back in charge.
Explore more articles in this category
We migrated most scheduled jobs from cron to systemd timers. The wins, the gotchas, and the cases we kept on cron anyway.
A curated list of shell one-liners that earn their place in real ops work — the ones I reach for weekly, not the trick-shot variety.
Generate an SSH key, set up passwordless login, and configure aliases for the servers you use daily — all without copy-pasting yet another long command.