Server Monitoring with Grafana and Prometheus: Metrics, Dashboards, and Alerting

Published on Nov 28, 2025 | approx. 5 min read |

devops monitoring server

A server without monitoring is a blind spot. You only learn about problems when users complain — or when the server has already crashed. Prometheus collects metrics, Grafana visualises them and sends alerts. Together they form the de facto standard monitoring stack for servers and Kubernetes.

Stack Architecture

Server              Prometheus          Grafana
[Node Exporter] ──→ [Scraping]  ──→  [Dashboards]
[nginx Exporter]    [Storage]         [Alerting]
[PHP-FPM Exporter]  [Query (PromQL)]  [Notifications]
[Blackbox Exporter]                   └── Email/Slack

Prometheus scrapes metrics from exporters at regular intervals (default: 15 seconds). The metrics are stored in a time-series database. Grafana connects to Prometheus as a data source and renders dashboards.

Setup with Docker Compose

# docker-compose.yml
version: '3.8'

services:
    prometheus:
        image: prom/prometheus:v2.50.0
        container_name: prometheus
        volumes:
            - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
            - prometheus_data:/prometheus
        command:
            - '--config.file=/etc/prometheus/prometheus.yml'
            - '--storage.tsdb.retention.time=30d'
        ports:
            - "9090:9090"
        restart: unless-stopped

    grafana:
        image: grafana/grafana-oss:10.3.0
        container_name: grafana
        environment:
            GF_SECURITY_ADMIN_USER: admin
            GF_SECURITY_ADMIN_PASSWORD: "${GRAFANA_PASSWORD}"
            GF_SERVER_ROOT_URL: https://monitoring.example.com
        volumes:
            - grafana_data:/var/lib/grafana
            - ./grafana/provisioning:/etc/grafana/provisioning
        ports:
            - "3000:3000"
        depends_on:
            - prometheus
        restart: unless-stopped

    node-exporter:
        image: prom/node-exporter:v1.7.0
        container_name: node-exporter
        pid: host
        volumes:
            - /proc:/host/proc:ro
            - /sys:/host/sys:ro
            - /:/rootfs:ro
        command:
            - '--path.procfs=/host/proc'
            - '--path.sysfs=/host/sys'
            - '--path.rootfs=/rootfs'
        ports:
            - "9100:9100"
        restart: unless-stopped

volumes:
    prometheus_data:
    grafana_data:

Prometheus Configuration

# prometheus/prometheus.yml
global:
    scrape_interval: 15s
    evaluation_interval: 15s

alerting:
    alertmanagers:
        - static_configs:
              - targets: ['alertmanager:9093']

rule_files:
    - /etc/prometheus/rules/*.yml

scrape_configs:
    - job_name: 'prometheus'
      static_configs:
          - targets: ['localhost:9090']

    - job_name: 'node'
      static_configs:
          - targets: ['node-exporter:9100']
      scrape_interval: 30s

    - job_name: 'nginx'
      static_configs:
          - targets: ['nginx-exporter:9113']

    - job_name: 'php-fpm'
      static_configs:
          - targets: ['php-fpm-exporter:9253']

    # Blackbox for HTTP monitoring
    - job_name: 'blackbox-http'
      metrics_path: /probe
      params:
          module: [http_2xx]
      static_configs:
          - targets:
                - https://www.wunner-software.de
                - https://firma-neu.ddev.site
      relabel_configs:
          - source_labels: [__address__]
            target_label: __param_target
          - source_labels: [__param_target]
            target_label: instance
          - target_label: __address__
            replacement: blackbox-exporter:9115

Important Metrics and PromQL

PromQL (Prometheus Query Language) is the query language for metrics.

CPU Utilisation

# CPU usage in percent (excluding idle)
100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# Load Average (1 minute)
node_load1

Memory

# Available memory in percent
(node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100

# Swap usage
(1 - (node_memory_SwapFree_bytes / node_memory_SwapTotal_bytes)) * 100

Disk Space

# Free disk space in percent
(node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100

# Disk I/O
rate(node_disk_io_time_seconds_total[5m])

nginx Metrics

# Active connections
nginx_connections_active

# Requests per second
rate(nginx_http_requests_total[5m])

# 4xx and 5xx error rate
rate(nginx_http_requests_total{status=~"[45].."}[5m])
    / rate(nginx_http_requests_total[5m]) * 100

HTTP Availability (Blackbox Exporter)

# 1 = Up, 0 = Down
probe_success

# SSL certificate expires in X days
(probe_ssl_earliest_cert_expiry - time()) / 86400

Alerting Rules

# prometheus/rules/alerts.yml
groups:
    - name: server
      rules:
          - alert: HighCPUUsage
            expr: >
                100 - (avg by (instance)
                (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 85
            for: 5m
            labels:
                severity: warning
            annotations:
                summary: "High CPU usage on {{ $labels.instance }}"
                description: "CPU usage {{ $value | printf \"%.1f\" }}% > 85% for 5 minutes"

          - alert: LowDiskSpace
            expr: >
                (node_filesystem_avail_bytes{mountpoint="/"}
                / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 10
            for: 1m
            labels:
                severity: critical
            annotations:
                summary: "Low disk space on {{ $labels.instance }}"
                description: "Only {{ $value | printf \"%.1f\" }}% free space remaining on /"

          - alert: HighMemoryUsage
            expr: >
                (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 90
            for: 5m
            labels:
                severity: critical
            annotations:
                summary: "Very high memory usage"

          - alert: SiteDown
            expr: probe_success == 0
            for: 1m
            labels:
                severity: critical
            annotations:
                summary: "Website unreachable: {{ $labels.instance }}"

          - alert: SSLCertificateExpiringSoon
            expr: (probe_ssl_earliest_cert_expiry - time()) / 86400 < 14
            for: 1h
            labels:
                severity: warning
            annotations:
                summary: "SSL certificate expiring soon: {{ $labels.instance }}"
                description: "{{ $value | printf \"%.0f\" }} days until expiry"

Grafana Dashboard Provisioning

# grafana/provisioning/dashboards/dashboards.yml
apiVersion: 1

providers:
    - name: 'default'
      orgId: 1
      folder: ''
      type: file
      options:
          path: /etc/grafana/provisioning/dashboards

For server monitoring, the ready-made "Node Exporter Full" dashboard (ID: 1860) from grafana.com is recommended — import it via Dashboard, then Import, then enter ID 1860.

Notifications: Email and Slack

In Grafana under Alerting, then Contact Points:

{
    "name": "email-alert",
    "type": "email",
    "settings": {
        "addresses": "thomas@wunner-software.de",
        "subject": "[{{ .Status | toUpper }}] {{ .CommonAnnotations.summary }}"
    }
}

For Slack:

{
    "name": "slack-alert",
    "type": "slack",
    "settings": {
        "url": "https://hooks.slack.com/services/xxx/yyy/zzz",
        "channel": "#monitoring",
        "title": "{{ .CommonAnnotations.summary }}",
        "text": "{{ .CommonAnnotations.description }}"
    }
}

Conclusion

Prometheus and Grafana are the industry standard for open-source monitoring for good reason: flexible, scalable, with a huge ecosystem of exporters. For a typical web server (nginx, PHP-FPM, MariaDB), Node Exporter, nginx Exporter, and Blackbox Exporter are sufficient to achieve full visibility.

The most important metrics to keep an eye on:

CPU and memory utilisation
Disk space (with trend)
HTTP availability and response times
SSL certificate expiry date
Error rate in nginx logs

Thomas Wunner

Certified IT specialist for application development with an instructor qualification and over 14 years of experience building scalable web applications with Symfony and Shopware. When not coding, Thomas volunteers as a lifeguard with the Wasserwacht, performs as a DJ, and explores the countryside on his motorbike.

Comments

Comments are provided by Remark42. By loading comments, data is transmitted to our comment server.

Server Monitoring with Grafana and Prometheus: Metrics, Dashboards, and Alerting

Stack Architecture¶

Setup with Docker Compose¶

Prometheus Configuration¶

Important Metrics and PromQL¶

CPU Utilisation¶

Memory¶

Disk Space¶

nginx Metrics¶

HTTP Availability (Blackbox Exporter)¶

Alerting Rules¶

Grafana Dashboard Provisioning¶

Notifications: Email and Slack¶

Conclusion¶