Concrete systemd unit patterns that reduced flakiness: restart policies, resource limits, and structured logs.

On this page

Systemd Tricks We Use to Keep Services Boring

After a few painful outages caused by homemade init scripts, we moved everything to systemd and wrote down the patterns that worked.

Pattern: Restart with Backoff #

We had a service that occasionally failed to bind its port on boot.

```ini [Unit] Description=API service After=network-online.target Wants=network-online.target

[Service] ExecStart=/usr/local/bin/api Restart=on-failure RestartSec=5

[Install] WantedBy=multi-user.target ```

Restart=on-failure + RestartSec gave the process room to recover without flapping.

Pattern: Non-Root with Limits #

We saw file descriptor exhaustion during load tests.

Added User=api and LimitNOFILE=65536.
Used Ansible to roll the unit file change across the fleet.

Pattern: Journald as a Timeline #

When something goes wrong, we start with:

`journalctl -u api -b`
`journalctl -u api --since "-15min"`

Systemd didn’t fix our code, but it made failures predictable and repeatable.

Systemd Tricks We Use to Keep Services Boring

Systemd Tricks We Use to Keep Services Boring

Pattern: Restart with Backoff #

Pattern: Non-Root with Limits #

Pattern: Journald as a Timeline #

Stay Updated

Disaster Recovery Planning: Building Resilient Infrastructure

How We Stopped Terraform Drift from Surprising On-Call

More from Linux

eBPF for SREs: Three Real Diagnoses That Saved Hours

Linux Memory Management: When OOM Killer Strikes Your K8s Pods

systemd Timers vs Cron: When We Switched and What We Learned

eBPF for SREs: Three Real Diagnoses That Saved Hours

Linux Memory Management: When OOM Killer Strikes Your K8s Pods

systemd Timers vs Cron: When We Switched and What We Learned

Linux Performance Troubleshooting: A Real Incident Walkthrough

Pulumi vs Terraform: What 18 Months of Production Taught Us

GCP Workload Identity Federation: Replacing Service Account Keys

About Kiril urbonas

You might have missed

GitOps with Argo CD: Best Practices for 2025

AI Agents in DevOps: From Copilots to Autonomous Automation in 2025

Prompt Engineering Best Practices: Maximizing LLM Performance

Systemd Tricks We Use to Keep Services Boring

Pattern: Restart with Backoff#

Pattern: Non-Root with Limits#

Pattern: Journald as a Timeline#

Stay Updated

Disaster Recovery Planning: Building Resilient Infrastructure

How We Stopped Terraform Drift from Surprising On-Call

More from Linux

eBPF for SREs: Three Real Diagnoses That Saved Hours

Linux Memory Management: When OOM Killer Strikes Your K8s Pods

systemd Timers vs Cron: When We Switched and What We Learned

About Kiril urbonas

You might have missed

GitOps with Argo CD: Best Practices for 2025

AI Agents in DevOps: From Copilots to Autonomous Automation in 2025

Prompt Engineering Best Practices: Maximizing LLM Performance

Pattern: Restart with Backoff #

Pattern: Non-Root with Limits #

Pattern: Journald as a Timeline #