A practical guide to writing and managing systemd services for production. The unit file features that earn their place, plus the operational workflows.
systemd is the init system on most modern Linux distributions. It manages services as units — text files that describe what to run, when, and how. After years of writing and operating systemd units in production, this is the working version: the unit file features that matter, the operational workflows, and the patterns that keep services reliable.
A systemd service unit file:
[Unit]
Description=My Application
After=network-online.target
Wants=network-online.target
[Service]
Type=exec
User=myapp
Group=myapp
WorkingDirectory=/opt/myapp
EnvironmentFile=/etc/myapp/myapp.env
ExecStart=/opt/myapp/bin/myapp
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
Three sections:
[Unit]: metadata, dependencies on other units[Service]: how to run the service[Install]: how to enable it at bootsystemctl enable myapp creates the symlink that activates the [Install] section. systemctl start runs it now. systemctl status shows current state.
Type= matters more than people realize#Service types tell systemd how to detect "the service has started":
Type=simple (default in older systemd): considers started the moment the process is forked. Bad — if your service crashes immediately, systemd thinks it succeeded for a few milliseconds.
Type=exec (newer default): waits until execve() returns. Better. Catches immediate crashes.
Type=notify: the service explicitly notifies systemd when ready (via sd_notify()). Best for services that want to delay "ready" until they've completed their startup (e.g., loaded data, connected to dependencies). Requires the service to support sd_notify.
Type=forking: legacy. The service forks; systemd considers it started when the parent exits. Less common in modern systems.
For new services, prefer exec or notify. Avoid simple (the legacy default).
Restart=on-failure
RestartSec=5
StartLimitIntervalSec=10min
StartLimitBurst=3
What this does:
Restart=on-failure: restart if the service exits with non-zero status (or is killed). Don't restart on clean exit.RestartSec=5: wait 5 seconds between restarts.StartLimitIntervalSec=10min + StartLimitBurst=3: if the service fails 3 times in 10 minutes, give up. systemd marks it failed and stops trying.This bounding is critical. Without it, a misconfigured service crash-loops forever, hammering whatever it depends on (database, downstream APIs) and generating mountains of logs.
We use Restart=on-failure with bounded retries on every service. The bounds catch "this is genuinely broken" vs "transient hiccup."
LimitNOFILE=65536
LimitNPROC=4096
MemoryMax=2G
CPUQuota=200%
TasksMax=512
Each is a real production safeguard:
LimitNOFILE: file descriptors. Default 1024 is too low for any service handling many connections.MemoryMax: hard memory limit. OOM-killed if exceeded. One service can't take down the host.CPUQuota=200%: up to 2 CPUs of work. Useful on multi-tenant nodes.TasksMax: hard cap on threads/processes. A thread leak gets caught.Set these explicitly. Defaults are wrong for most production services.
systemd has built-in process isolation features:
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ReadWritePaths=/var/lib/myapp /var/log/myapp
ProtectHome=true
ProtectKernelTunables=true
ProtectKernelModules=true
ProtectControlGroups=true
RestrictRealtime=true
RestrictNamespaces=true
RestrictSUIDSGID=true
LockPersonality=true
What each does:
NoNewPrivileges: process can't escalate via setuid binaries.PrivateTmp: each service gets its own /tmp.ProtectSystem=strict: most of the filesystem is read-only for this process.ReadWritePaths: the specific dirs the service can write to (alongside ProtectSystem=strict).ProtectHome: /home is invisible.ProtectKernel*: can't fiddle with kernel.These add up to meaningful defense-in-depth. A compromised service has much less reach than without them.
The cost: setting them up requires knowing what the service needs to write. The first time you add ProtectSystem=strict, you'll find paths the service was writing to that you didn't know about (caches, logs, temp files). Worth knowing.
For services that integrate with sd_notify:
Type=notify
WatchdogSec=30
NotifyAccess=main
The service sends sd_notify(WATCHDOG=1) periodically (we send it every 10 seconds in our Go services). If 30 seconds elapse without a watchdog ping, systemd considers the service hung and restarts it.
This catches a deadlocked service that the kernel sees as "alive" but isn't actually doing work.
EnvironmentFile=/etc/myapp/myapp.env
The file:
DATABASE_URL=postgres://...
LOG_LEVEL=INFO
API_KEY=xxxxxxx
The service starts with these as environment variables. We use this for non-secret config plus secrets pulled from a secrets manager at deploy time.
What we don't do: hardcode environment variables in the unit file via Environment=.... Better to load from a file so the unit file is environment-agnostic.
StandardOutput=journal
StandardError=journal
SyslogIdentifier=myapp
Standard pattern. journalctl -u myapp reads logs back. Production logs typically ship to a central aggregator (Fluent Bit reads from journal); journald is the local cache.
For containerized workloads, we don't use journald — we read logs via the container runtime's interface.
For scheduled jobs, systemd timers replace cron:
# /etc/systemd/system/backup.service
[Unit]
Description=Daily backup
[Service]
Type=oneshot
ExecStart=/usr/local/bin/backup.sh
# /etc/systemd/system/backup.timer
[Unit]
Description=Run backup daily
[Timer]
OnCalendar=*-*-* 02:00:00
Persistent=true
[Install]
WantedBy=timers.target
systemctl enable backup.timer enables it. The timer triggers backup.service at 2 AM daily.
Why timers over cron:
We use timers for new scheduled jobs. Cron stays for legacy.
A few patterns we use day-to-day:
Reload config without restart:
systemctl reload myapp (works if ExecReload= is set)systemctl restart myapp (full restart)Check status:
systemctl status myapp — current statejournalctl -u myapp -f — follow logssystemctl show myapp — full configuration including computed valuesDebug start failures:
systemctl status myapp — shows recent log lines and exit reasonjournalctl -u myapp --since "5 min ago" — recent logssystemd-analyze verify /etc/systemd/system/myapp.service — syntax checkFind what depends on what:
systemctl list-dependencies myapp — what myapp depends onsystemctl list-dependencies --reverse myapp — what depends on myappOverride behaviors without editing the original:
systemctl edit myapp — creates an override file in /etc/systemd/system/myapp.service.d/override.conf. Original unit untouched; override layered on top. Cleaner than editing the original.Things that bite people:
Forgetting daemon-reload after editing. systemd caches unit files. After editing, run systemctl daemon-reload for changes to take effect.
Using Type=simple for services that crash on start. The crash isn't visible because systemd considers them started. Switch to exec.
Restart=always causing crash loops. Don't use always; use on-failure with bounded retries.
Not setting resource limits. Defaults are too permissive for some, too restrictive for others. Set explicitly.
Inheriting environment. Some services rely on environment variables set in user sessions. systemd starts services with a minimal environment. Use EnvironmentFile.
ExecStart with shell expansion. systemd doesn't run a shell by default. ExecStart=/usr/bin/echo $HOME won't expand $HOME. Use ExecStart=/bin/bash -c 'echo $HOME' or set the variable explicitly.
Debugging a misbehaving service:
systemctl status shows recent state and brief logjournalctl -u service --since "1 hour ago" for full logssystemd-analyze verify for unit file syntaxsystemctl show service for the full computed unitsystemctl cat service for the actual unit file (including overrides)systemctl list-dependenciesFor services that don't behave under systemd but work fine manually, it's usually one of:
WorkingDirectory=)Use Type=exec or Type=notify, not simple. Better failure detection.
Bound your restarts. Restart=on-failure + start limit prevents crash loops.
Set resource limits explicitly. Defaults are wrong for production.
Use the hardening directives. Cheap defense-in-depth.
Use timers, not cron. Better integration with systemd ecosystem.
systemctl edit, not direct file editing. Overrides are cleaner.
Read journalctl output. The errors are usually clear; the discipline is in actually reading them rather than guessing.
systemd has a reputation for complexity; in practice, the day-to-day surface area is small. A few directives, a few commands, and you can write robust services that survive contact with production. The patterns above are the ones we keep coming back to. Most service issues come back to one of them — which makes them worth knowing well.
Get the latest tutorials, guides, and insights on AI, DevOps, Cloud, and Infrastructure delivered directly to your inbox.
Explore more articles in this category
We migrated most scheduled jobs from cron to systemd timers. The wins, the gotchas, and the cases we kept on cron anyway.
A curated list of shell one-liners that earn their place in real ops work — the ones I reach for weekly, not the trick-shot variety.
Generate an SSH key, set up passwordless login, and configure aliases for the servers you use daily — all without copy-pasting yet another long command.