Concrete systemd unit patterns that reduced flakiness: restart policies, resource limits, and structured logs.

On this page

Systemd Tricks We Use to Keep Services Boring

After a few painful outages caused by homemade init scripts, we moved everything to systemd and wrote down the patterns that worked.

Pattern: Restart with Backoff #

We had a service that occasionally failed to bind its port on boot.

```ini [Unit] Description=API service After=network-online.target Wants=network-online.target

[Service] ExecStart=/usr/local/bin/api Restart=on-failure RestartSec=5

[Install] WantedBy=multi-user.target ```

Restart=on-failure + RestartSec gave the process room to recover without flapping.

Pattern: Non-Root with Limits #

We saw file descriptor exhaustion during load tests.

Added User=api and LimitNOFILE=65536.
Used Ansible to roll the unit file change across the fleet.

Pattern: Journald as a Timeline #

When something goes wrong, we start with:

`journalctl -u api -b`
`journalctl -u api --since "-15min"`

Systemd didn’t fix our code, but it made failures predictable and repeatable.

Systemd Tricks We Use to Keep Services Boring

Systemd Tricks We Use to Keep Services Boring

Pattern: Restart with Backoff #

Pattern: Non-Root with Limits #

Pattern: Journald as a Timeline #

Stay Updated

Disaster Recovery in the Cloud: Backup and Recovery Strategies

How We Stopped Terraform Drift from Surprising On-Call

More from Linux

Linux io_uring — Async I/O Patterns We Use

Container Resource Limits — What They Actually Do at the Kernel Level

eBPF Tools for Everyday Ops — bpftrace Patterns We Use

Linux io_uring — Async I/O Patterns We Use

Container Resource Limits — What They Actually Do at the Kernel Level

eBPF Tools for Everyday Ops — bpftrace Patterns We Use

systemd Timers vs Cron — What We Learned Switching

Terraform Module Versioning and Shared Registries

Bash One-Liners We Actually Use

About Kiril Urbonas

You might have missed

GitOps with Argo CD: Best Practices for 2025

Prompt Engineering Best Practices: Maximizing LLM Performance

AI Agents in DevOps: From Copilots to Autonomous Automation in 2025

Handling Vulnerabilities in Production — What We Actually Do

Systemd Tricks We Use to Keep Services Boring

Pattern: Restart with Backoff#

Pattern: Non-Root with Limits#

Pattern: Journald as a Timeline#

Stay Updated

Disaster Recovery in the Cloud: Backup and Recovery Strategies

How We Stopped Terraform Drift from Surprising On-Call

More from Linux

Linux io_uring — Async I/O Patterns We Use

Container Resource Limits — What They Actually Do at the Kernel Level

eBPF Tools for Everyday Ops — bpftrace Patterns We Use

About Kiril Urbonas

You might have missed

GitOps with Argo CD: Best Practices for 2025

Prompt Engineering Best Practices: Maximizing LLM Performance

AI Agents in DevOps: From Copilots to Autonomous Automation in 2025

Handling Vulnerabilities in Production — What We Actually Do

Pattern: Restart with Backoff #

Pattern: Non-Root with Limits #

Pattern: Journald as a Timeline #