A server without monitoring is a blind spot. You only learn about problems when users complain — or when the server has already crashed. Prometheus collects metrics, Grafana visualises them and sends alerts. Together they form the de facto standard monitoring stack for servers and Kubernetes.
Stack Architecture
Server Prometheus Grafana
[Node Exporter] ──→ [Scraping] ──→ [Dashboards]
[nginx Exporter] [Storage] [Alerting]
[PHP-FPM Exporter] [Query (PromQL)] [Notifications]
[Blackbox Exporter] └── Email/Slack
Prometheus scrapes metrics from exporters at regular intervals (default: 15 seconds). The metrics are stored in a time-series database. Grafana connects to Prometheus as a data source and renders dashboards.
Setup with Docker Compose
# docker-compose.yml
version: '3.8'
services:
prometheus:
image: prom/prometheus:v2.50.0
container_name: prometheus
volumes:
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.retention.time=30d'
ports:
- "9090:9090"
restart: unless-stopped
grafana:
image: grafana/grafana-oss:10.3.0
container_name: grafana
environment:
GF_SECURITY_ADMIN_USER: admin
GF_SECURITY_ADMIN_PASSWORD: "${GRAFANA_PASSWORD}"
GF_SERVER_ROOT_URL: https://monitoring.example.com
volumes:
- grafana_data:/var/lib/grafana
- ./grafana/provisioning:/etc/grafana/provisioning
ports:
- "3000:3000"
depends_on:
- prometheus
restart: unless-stopped
node-exporter:
image: prom/node-exporter:v1.7.0
container_name: node-exporter
pid: host
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command:
- '--path.procfs=/host/proc'
- '--path.sysfs=/host/sys'
- '--path.rootfs=/rootfs'
ports:
- "9100:9100"
restart: unless-stopped
volumes:
prometheus_data:
grafana_data:
Prometheus Configuration
# prometheus/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets: ['alertmanager:9093']
rule_files:
- /etc/prometheus/rules/*.yml
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node'
static_configs:
- targets: ['node-exporter:9100']
scrape_interval: 30s
- job_name: 'nginx'
static_configs:
- targets: ['nginx-exporter:9113']
- job_name: 'php-fpm'
static_configs:
- targets: ['php-fpm-exporter:9253']
# Blackbox for HTTP monitoring
- job_name: 'blackbox-http'
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets:
- https://www.wunner-software.de
- https://firma-neu.ddev.site
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: blackbox-exporter:9115
Important Metrics and PromQL
PromQL (Prometheus Query Language) is the query language for metrics.
CPU Utilisation
# CPU usage in percent (excluding idle)
100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
# Load Average (1 minute)
node_load1
Memory
# Available memory in percent
(node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100
# Swap usage
(1 - (node_memory_SwapFree_bytes / node_memory_SwapTotal_bytes)) * 100
Disk Space
# Free disk space in percent
(node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100
# Disk I/O
rate(node_disk_io_time_seconds_total[5m])
nginx Metrics
# Active connections
nginx_connections_active
# Requests per second
rate(nginx_http_requests_total[5m])
# 4xx and 5xx error rate
rate(nginx_http_requests_total{status=~"[45].."}[5m])
/ rate(nginx_http_requests_total[5m]) * 100
HTTP Availability (Blackbox Exporter)
# 1 = Up, 0 = Down
probe_success
# SSL certificate expires in X days
(probe_ssl_earliest_cert_expiry - time()) / 86400
Alerting Rules
# prometheus/rules/alerts.yml
groups:
- name: server
rules:
- alert: HighCPUUsage
expr: >
100 - (avg by (instance)
(irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 85
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU usage on {{ $labels.instance }}"
description: "CPU usage {{ $value | printf \"%.1f\" }}% > 85% for 5 minutes"
- alert: LowDiskSpace
expr: >
(node_filesystem_avail_bytes{mountpoint="/"}
/ node_filesystem_size_bytes{mountpoint="/"}) * 100 < 10
for: 1m
labels:
severity: critical
annotations:
summary: "Low disk space on {{ $labels.instance }}"
description: "Only {{ $value | printf \"%.1f\" }}% free space remaining on /"
- alert: HighMemoryUsage
expr: >
(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 90
for: 5m
labels:
severity: critical
annotations:
summary: "Very high memory usage"
- alert: SiteDown
expr: probe_success == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Website unreachable: {{ $labels.instance }}"
- alert: SSLCertificateExpiringSoon
expr: (probe_ssl_earliest_cert_expiry - time()) / 86400 < 14
for: 1h
labels:
severity: warning
annotations:
summary: "SSL certificate expiring soon: {{ $labels.instance }}"
description: "{{ $value | printf \"%.0f\" }} days until expiry"
Grafana Dashboard Provisioning
# grafana/provisioning/dashboards/dashboards.yml
apiVersion: 1
providers:
- name: 'default'
orgId: 1
folder: ''
type: file
options:
path: /etc/grafana/provisioning/dashboards
For server monitoring, the ready-made "Node Exporter Full" dashboard (ID: 1860) from grafana.com is recommended — import it via Dashboard, then Import, then enter ID 1860.
Notifications: Email and Slack
In Grafana under Alerting, then Contact Points:
{
"name": "email-alert",
"type": "email",
"settings": {
"addresses": "thomas@wunner-software.de",
"subject": "[{{ .Status | toUpper }}] {{ .CommonAnnotations.summary }}"
}
}
For Slack:
{
"name": "slack-alert",
"type": "slack",
"settings": {
"url": "https://hooks.slack.com/services/xxx/yyy/zzz",
"channel": "#monitoring",
"title": "{{ .CommonAnnotations.summary }}",
"text": "{{ .CommonAnnotations.description }}"
}
}
Conclusion
Prometheus and Grafana are the industry standard for open-source monitoring for good reason: flexible, scalable, with a huge ecosystem of exporters. For a typical web server (nginx, PHP-FPM, MariaDB), Node Exporter, nginx Exporter, and Blackbox Exporter are sufficient to achieve full visibility.
The most important metrics to keep an eye on:
- CPU and memory utilisation
- Disk space (with trend)
- HTTP availability and response times
- SSL certificate expiry date
- Error rate in nginx logs
Comments
Comments are provided by Remark42. By loading comments, data is transmitted to our comment server.