Linux System Monitoring: Prometheus and Grafana Setup Guide 2024 | DevOpsNess

Linux System Monitoring with Prometheus and Grafana

Comprehensive system monitoring is essential for maintaining healthy Linux servers. This guide shows you how to set up Prometheus and Grafana for production-grade monitoring.

Why Prometheus and Grafana?#

Prometheus: Time-series database optimized for metrics
Grafana: Powerful visualization and alerting
Open Source: Free and community-driven
Scalable: Handles millions of metrics

Architecture Overview #

code

┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│   Node      │────▶│  Prometheus  │────▶│   Grafana   │
│  Exporter   │     │   (Metrics)  │     │ (Dashboards)│
└─────────────┘     └──────────────┘     └─────────────┘

Installation #

1. Install Prometheus #

bash.bash

# Download Prometheus
wget https://github.com/prometheus/prometheus/releases/download/v2.48.0/prometheus-2.48.0.linux-amd64.tar.gz
tar xvfz prometheus-*.tar.gz
cd prometheus-*

# Create systemd service
sudo tee /etc/systemd/system/prometheus.service <<EOF
[Unit]
Description=Prometheus
After=network.target

[Service]
Type=simple
User=prometheus
ExecStart=/usr/local/bin/prometheus \
  --config.file=/etc/prometheus/prometheus.yml \
  --storage.tsdb.path=/var/lib/prometheus/ \
  --web.console.templates=/etc/prometheus/consoles \
  --web.console.libraries=/etc/prometheus/console_libraries

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable prometheus
sudo systemctl start prometheus

2. Install Node Exporter #

bash.bash

# Download Node Exporter
wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz
tar xvfz node_exporter-*.tar.gz
sudo cp node_exporter-*/node_exporter /usr/local/bin/

# Create systemd service
sudo tee /etc/systemd/system/node_exporter.service <<EOF
[Unit]
Description=Node Exporter
After=network.target

[Service]
Type=simple
User=node_exporter
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable node_exporter
sudo systemctl start node_exporter

3. Configure Prometheus #

yaml.yaml

# /etc/prometheus/prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node'
    static_configs:
      - targets: ['localhost:9100']
        labels:
          instance: 'server-01'
          environment: 'production'

4. Install Grafana #

bash.bash

# Add Grafana repository
sudo apt-get install -y software-properties-common
sudo add-apt-repository "deb https://packages.grafana.com/oss/deb stable main"
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -

# Install Grafana
sudo apt-get update
sudo apt-get install grafana

# Start Grafana
sudo systemctl daemon-reload
sudo systemctl enable grafana-server
sudo systemctl start grafana-server

Key Metrics to Monitor #

System Metrics #

CPU Usage: node_cpu_seconds_total
Memory: node_memory_MemTotal_bytes, node_memory_MemAvailable_bytes
Disk I/O: node_disk_io_time_seconds_total
Network: node_network_receive_bytes_total, node_network_transmit_bytes_total
Load Average: node_load1, node_load5, node_load15

Application Metrics #

Request Rate: http_requests_total
Error Rate: http_requests_total{status=~"5.."}
Response Time: http_request_duration_seconds
Active Connections: http_connections_active

PromQL Queries #

CPU Usage #

code

100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

Memory Usage #

code

(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100

Disk Usage #

code

100 - ((node_filesystem_avail_bytes{mountpoint="/"} * 100) / node_filesystem_size_bytes{mountpoint="/"})

Grafana Dashboard Setup #

1. Add Prometheus Data Source #

Go to Configuration → Data Sources
Click "Add data source"
Select Prometheus
Enter URL: http://localhost:9090
Click "Save & Test"

2. Import Dashboard #

Go to Dashboards → Import
Enter dashboard ID: 1860 (Node Exporter Full)
Select Prometheus data source
Click "Import"

Alerting Rules #

yaml.yaml

# /etc/prometheus/alerts.yml
groups:
  - name: system_alerts
    interval: 30s
    rules:
      - alert: HighCPUUsage
        expr: 100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High CPU usage on {{ $labels.instance }}"
          description: "CPU usage is above 80% for 5 minutes"

      - alert: HighMemoryUsage
        expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > 90
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High memory usage on {{ $labels.instance }}"
          description: "Memory usage is above 90%"

      - alert: DiskSpaceLow
        expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 10
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Low disk space on {{ $labels.instance }}"
          description: "Disk space is below 10%"

Docker Compose Setup #

yaml.yaml

version: '3.8'

services:
  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus-data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    volumes:
      - grafana-data:/var/lib/grafana
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    depends_on:
      - prometheus

  node-exporter:
    image: prom/node-exporter:latest
    ports:
      - "9100:9100"
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.sysfs=/host/sys'
      - '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'

volumes:
  prometheus-data:
  grafana-data:

Best Practices #

Retention Policy: Set appropriate retention for metrics
Labeling: Use consistent labels for filtering
Cardinality: Avoid high-cardinality labels
Alerts: Set up meaningful alert thresholds
Dashboards: Create focused dashboards per team/service
Backup: Regularly backup Grafana dashboards

Prometheus and Grafana provide a powerful, open-source solution for Linux system monitoring. With proper configuration, you can gain deep insights into your infrastructure and applications, enabling proactive problem resolution.

Production Notes 1 #

For Linux System Monitoring with Prometheus and Grafana, define pre-deploy checks, rollout gates, and rollback triggers before release. Track p95 latency, error rate, and cost per request for at least 24 hours after deployment. If the trend regresses from baseline, revert quickly and document the decision in the runbook.

Keep the operating model simple under pressure: one owner per change, one decision channel, and clear stop conditions. Review alert quality regularly to remove noise and ensure on-call engineers can distinguish urgent failures from routine variance.

Repeatability is the goal. Convert successful interventions into standard operating procedures and version them in the repository so future responders can execute the same flow without ambiguity.

Production Notes 2 #

Repeatability is the goal. Convert successful interventions into standard operating procedures and version them in the repository so future responders can execute the same flow without ambiguity.

Production Notes 3 #

Repeatability is the goal. Convert successful interventions into standard operating procedures and version them in the repository so future responders can execute the same flow without ambiguity.

Linux System Monitoring with Prometheus and Grafana

Stay Updated

Linux System Monitoring with Prometheus and Grafana

Why Prometheus and Grafana?#

Architecture Overview #

Installation #

1. Install Prometheus #

2. Install Node Exporter #

3. Configure Prometheus #

4. Install Grafana #

Key Metrics to Monitor #

System Metrics #

Application Metrics #

PromQL Queries #

CPU Usage #

Memory Usage #

Disk Usage #

Grafana Dashboard Setup #

1. Add Prometheus Data Source #

2. Import Dashboard #

Alerting Rules #

Docker Compose Setup #

Best Practices #

Conclusion #

Production Notes 1 #

Production Notes 2 #

Production Notes 3 #

Practical Guide: Systemd Service Reliability Patterns

Infrastructure as Code: Terraform vs Pulumi vs Ansible

More from Linux

Linux Container Internals: Understanding How Containers Work

Shell Scripting Best Practices: Writing Maintainable Scripts

File System Optimization: Improving Disk Performance

Linux Container Internals: Understanding How Containers Work

Shell Scripting Best Practices: Writing Maintainable Scripts

File System Optimization: Improving Disk Performance

Process Management and Monitoring in Linux

Operational Checklist: AI Inference Cost Optimization

Operational Checklist: SLO-Based Monitoring for APIs

About Kiril Urbonas