Skip to content

Server Monitoring with Grafana and Prometheus: Metrics, Dashboards, and Alerting

Published on Nov 28, 2025 | approx. 5 min read |

A server without monitoring is a blind spot. You only learn about problems when users complain — or when the server has already crashed. Prometheus collects metrics, Grafana visualises them and sends alerts. Together they form the de facto standard monitoring stack for servers and Kubernetes.

Stack Architecture

Server              Prometheus          Grafana
[Node Exporter] ──→ [Scraping]  ──→  [Dashboards]
[nginx Exporter]    [Storage]         [Alerting]
[PHP-FPM Exporter]  [Query (PromQL)]  [Notifications]
[Blackbox Exporter]                   └── Email/Slack

Prometheus scrapes metrics from exporters at regular intervals (default: 15 seconds). The metrics are stored in a time-series database. Grafana connects to Prometheus as a data source and renders dashboards.

Setup with Docker Compose

# docker-compose.yml
version: '3.8'

services:
    prometheus:
        image: prom/prometheus:v2.50.0
        container_name: prometheus
        volumes:
            - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
            - prometheus_data:/prometheus
        command:
            - '--config.file=/etc/prometheus/prometheus.yml'
            - '--storage.tsdb.retention.time=30d'
        ports:
            - "9090:9090"
        restart: unless-stopped

    grafana:
        image: grafana/grafana-oss:10.3.0
        container_name: grafana
        environment:
            GF_SECURITY_ADMIN_USER: admin
            GF_SECURITY_ADMIN_PASSWORD: "${GRAFANA_PASSWORD}"
            GF_SERVER_ROOT_URL: https://monitoring.example.com
        volumes:
            - grafana_data:/var/lib/grafana
            - ./grafana/provisioning:/etc/grafana/provisioning
        ports:
            - "3000:3000"
        depends_on:
            - prometheus
        restart: unless-stopped

    node-exporter:
        image: prom/node-exporter:v1.7.0
        container_name: node-exporter
        pid: host
        volumes:
            - /proc:/host/proc:ro
            - /sys:/host/sys:ro
            - /:/rootfs:ro
        command:
            - '--path.procfs=/host/proc'
            - '--path.sysfs=/host/sys'
            - '--path.rootfs=/rootfs'
        ports:
            - "9100:9100"
        restart: unless-stopped

volumes:
    prometheus_data:
    grafana_data:

Prometheus Configuration

# prometheus/prometheus.yml
global:
    scrape_interval: 15s
    evaluation_interval: 15s

alerting:
    alertmanagers:
        - static_configs:
              - targets: ['alertmanager:9093']

rule_files:
    - /etc/prometheus/rules/*.yml

scrape_configs:
    - job_name: 'prometheus'
      static_configs:
          - targets: ['localhost:9090']

    - job_name: 'node'
      static_configs:
          - targets: ['node-exporter:9100']
      scrape_interval: 30s

    - job_name: 'nginx'
      static_configs:
          - targets: ['nginx-exporter:9113']

    - job_name: 'php-fpm'
      static_configs:
          - targets: ['php-fpm-exporter:9253']

    # Blackbox for HTTP monitoring
    - job_name: 'blackbox-http'
      metrics_path: /probe
      params:
          module: [http_2xx]
      static_configs:
          - targets:
                - https://www.wunner-software.de
                - https://firma-neu.ddev.site
      relabel_configs:
          - source_labels: [__address__]
            target_label: __param_target
          - source_labels: [__param_target]
            target_label: instance
          - target_label: __address__
            replacement: blackbox-exporter:9115

Important Metrics and PromQL

PromQL (Prometheus Query Language) is the query language for metrics.

CPU Utilisation

# CPU usage in percent (excluding idle)
100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# Load Average (1 minute)
node_load1

Memory

# Available memory in percent
(node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100

# Swap usage
(1 - (node_memory_SwapFree_bytes / node_memory_SwapTotal_bytes)) * 100

Disk Space

# Free disk space in percent
(node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100

# Disk I/O
rate(node_disk_io_time_seconds_total[5m])

nginx Metrics

# Active connections
nginx_connections_active

# Requests per second
rate(nginx_http_requests_total[5m])

# 4xx and 5xx error rate
rate(nginx_http_requests_total{status=~"[45].."}[5m])
    / rate(nginx_http_requests_total[5m]) * 100

HTTP Availability (Blackbox Exporter)

# 1 = Up, 0 = Down
probe_success

# SSL certificate expires in X days
(probe_ssl_earliest_cert_expiry - time()) / 86400

Alerting Rules

# prometheus/rules/alerts.yml
groups:
    - name: server
      rules:
          - alert: HighCPUUsage
            expr: >
                100 - (avg by (instance)
                (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 85
            for: 5m
            labels:
                severity: warning
            annotations:
                summary: "High CPU usage on {{ $labels.instance }}"
                description: "CPU usage {{ $value | printf \"%.1f\" }}% > 85% for 5 minutes"

          - alert: LowDiskSpace
            expr: >
                (node_filesystem_avail_bytes{mountpoint="/"}
                / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 10
            for: 1m
            labels:
                severity: critical
            annotations:
                summary: "Low disk space on {{ $labels.instance }}"
                description: "Only {{ $value | printf \"%.1f\" }}% free space remaining on /"

          - alert: HighMemoryUsage
            expr: >
                (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 90
            for: 5m
            labels:
                severity: critical
            annotations:
                summary: "Very high memory usage"

          - alert: SiteDown
            expr: probe_success == 0
            for: 1m
            labels:
                severity: critical
            annotations:
                summary: "Website unreachable: {{ $labels.instance }}"

          - alert: SSLCertificateExpiringSoon
            expr: (probe_ssl_earliest_cert_expiry - time()) / 86400 < 14
            for: 1h
            labels:
                severity: warning
            annotations:
                summary: "SSL certificate expiring soon: {{ $labels.instance }}"
                description: "{{ $value | printf \"%.0f\" }} days until expiry"

Grafana Dashboard Provisioning

# grafana/provisioning/dashboards/dashboards.yml
apiVersion: 1

providers:
    - name: 'default'
      orgId: 1
      folder: ''
      type: file
      options:
          path: /etc/grafana/provisioning/dashboards

For server monitoring, the ready-made "Node Exporter Full" dashboard (ID: 1860) from grafana.com is recommended — import it via Dashboard, then Import, then enter ID 1860.

Notifications: Email and Slack

In Grafana under Alerting, then Contact Points:

{
    "name": "email-alert",
    "type": "email",
    "settings": {
        "addresses": "thomas@wunner-software.de",
        "subject": "[{{ .Status | toUpper }}] {{ .CommonAnnotations.summary }}"
    }
}

For Slack:

{
    "name": "slack-alert",
    "type": "slack",
    "settings": {
        "url": "https://hooks.slack.com/services/xxx/yyy/zzz",
        "channel": "#monitoring",
        "title": "{{ .CommonAnnotations.summary }}",
        "text": "{{ .CommonAnnotations.description }}"
    }
}

Conclusion

Prometheus and Grafana are the industry standard for open-source monitoring for good reason: flexible, scalable, with a huge ecosystem of exporters. For a typical web server (nginx, PHP-FPM, MariaDB), Node Exporter, nginx Exporter, and Blackbox Exporter are sufficient to achieve full visibility.

The most important metrics to keep an eye on:

  1. CPU and memory utilisation
  2. Disk space (with trend)
  3. HTTP availability and response times
  4. SSL certificate expiry date
  5. Error rate in nginx logs
Thomas Wunner

Thomas Wunner

Certified IT specialist for application development with an instructor qualification and over 14 years of experience building scalable web applications with Symfony and Shopware. When not coding, Thomas volunteers as a lifeguard with the Wasserwacht, performs as a DJ, and explores the countryside on his motorbike.

Comments

Comments are provided by Remark42. By loading comments, data is transmitted to our comment server.