-
Notifications
You must be signed in to change notification settings - Fork 0
Monitoring
Displace automatically installs a complete monitoring stack on every cluster, providing metrics collection, visualization, and alerting out of the box.
flowchart TB
subgraph cluster["Kubernetes Cluster"]
subgraph apps["Your Applications"]
app1["App Pod 1"]
app2["App Pod 2"]
end
subgraph monitoring["Monitoring Stack"]
prom["Prometheus<br/>Metrics Collection"]
grafana["Grafana<br/>Visualization"]
alert["Alertmanager<br/>Alert Routing"]
end
subgraph system["System Components"]
node["Node Exporter"]
kube["kube-state-metrics"]
end
end
app1 -->|"metrics"| prom
app2 -->|"metrics"| prom
node -->|"node metrics"| prom
kube -->|"k8s metrics"| prom
prom -->|"queries"| grafana
prom -->|"alerts"| alert
What's Included:
- Prometheus - Time-series metrics database and alerting
- Grafana - Visualization dashboards and exploration
- Alertmanager - Alert routing and notification management
- Node Exporter - Host-level metrics (CPU, memory, disk)
- kube-state-metrics - Kubernetes object metrics
Monitoring is installed automatically during cluster creation and bootstrapping.
# Monitoring is enabled by default
displace cluster create production --provider aws
# Explicitly enable (same as default)
displace cluster create production --provider aws --monitoring
# Disable monitoring (not recommended)
displace cluster create production --provider aws --monitoring=false# Bootstrap includes monitoring by default
displace cluster bootstrap production --provider aws
# Disable monitoring during bootstrap
displace cluster bootstrap production --provider aws --monitoring=false# displace install sets up monitoring on local cluster
displace installExpected output:
Installing cluster components...
✓ Ingress controller (nginx)
✓ Monitoring stack (Prometheus, Grafana, Alertmanager)
✓ Cloudflare Tunnel daemon
Grafana provides the primary interface for viewing metrics and dashboards.
# Forward Grafana to localhost:3000
kubectl port-forward -n monitoring svc/grafana 3000:80Open your browser: http://localhost:3000
| Field | Value |
|---|---|
| Username | admin |
| Password | admin |
Note: You'll be prompted to change the password on first login in production environments.
If you have Ingress configured with a domain:
# Example Ingress for Grafana
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: grafana-ingress
namespace: monitoring
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
ingressClassName: nginx
tls:
- hosts:
- grafana.example.com
secretName: grafana-tls
rules:
- host: grafana.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: grafana
port:
number: 80Prometheus provides the metrics backend and query interface.
# Forward Prometheus to localhost:9090
kubectl port-forward -n monitoring svc/prometheus-server 9090:80Open your browser: http://localhost:9090
- Graph - Execute PromQL queries and visualize results
- Alerts - View configured alerting rules
- Status - Check targets, configuration, and runtime info
- Targets - See all scrape targets and their status
Alertmanager handles alert routing and notifications.
# Forward Alertmanager to localhost:9093
kubectl port-forward -n monitoring svc/alertmanager 9093:80Open your browser: http://localhost:9093
- View active alerts
- Silence alerts temporarily
- Configure notification routing
- Group related alerts
Grafana comes with several pre-configured dashboards:
| Dashboard | Description |
|---|---|
| Kubernetes Cluster | Overall cluster health and resource usage |
| Kubernetes Nodes | Per-node CPU, memory, disk, network |
| Kubernetes Pods | Pod-level metrics and status |
| Kubernetes Deployments | Deployment health and scaling |
| Kubernetes Namespaces | Per-namespace resource consumption |
| Dashboard | Description |
|---|---|
| Node Exporter Full | Detailed host metrics |
| CoreDNS | DNS query metrics |
| NGINX Ingress | Ingress controller metrics |
- Open Grafana (http://localhost:3000)
- Click Dashboards in the left sidebar
- Click Browse
- Select a dashboard from the list
Prometheus uses PromQL (Prometheus Query Language) for querying metrics.
Basic syntax:
metric_name{label="value"}
# Total cluster CPU usage (%)
100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
# Total cluster memory usage (%)
(1 - (sum(node_memory_MemAvailable_bytes) / sum(node_memory_MemTotal_bytes))) * 100
# Total cluster disk usage (%)
(1 - (sum(node_filesystem_avail_bytes{mountpoint="/"}) / sum(node_filesystem_size_bytes{mountpoint="/"}))) * 100
# CPU usage by pod
sum(rate(container_cpu_usage_seconds_total{namespace="my-namespace"}[5m])) by (pod)
# Memory usage by pod
sum(container_memory_usage_bytes{namespace="my-namespace"}) by (pod)
# Pod restart count
sum(kube_pod_container_status_restarts_total{namespace="my-namespace"}) by (pod)
# HTTP request rate (if your app exports metrics)
sum(rate(http_requests_total{namespace="my-namespace"}[5m])) by (service)
# HTTP error rate
sum(rate(http_requests_total{namespace="my-namespace",status=~"5.."}[5m])) by (service)
# Request latency (p99)
histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))
- Open Grafana
- Click + → Dashboard
- Click Add visualization
- Select Prometheus as the data source
- Enter your PromQL query
- Choose visualization type (Graph, Stat, Gauge, etc.)
- Configure display options
- Click Apply
- Click Save dashboard (disk icon)
- Enter a name
- Choose a folder
- Click Save
Create a dashboard with these panels:
| Panel | Query | Visualization |
|---|---|---|
| Request Rate | sum(rate(http_requests_total[5m])) |
Graph |
| Error Rate | sum(rate(http_requests_total{status=~"5.."}[5m])) |
Graph |
| Latency P99 | histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m])) |
Stat |
| Active Pods | count(kube_pod_status_phase{phase="Running"}) |
Stat |
Alerting rules are defined in Prometheus and sent to Alertmanager.
Example alert rule:
groups:
- name: application-alerts
rules:
- alert: HighErrorRate
expr: sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) > 0.05
for: 5m
labels:
severity: critical
annotations:
summary: "High error rate detected"
description: "Error rate is above 5% for more than 5 minutes"
- alert: PodCrashLooping
expr: rate(kube_pod_container_status_restarts_total[15m]) > 0
for: 5m
labels:
severity: warning
annotations:
summary: "Pod is crash looping"
description: "Pod {{ $labels.pod }} is restarting frequently"Grafana can also create alerts based on dashboard panels:
- Edit a panel
- Click Alert tab
- Click Create alert rule from this panel
- Configure conditions and notifications
- Save
Alertmanager routes alerts based on labels and sends notifications.
route:
group_by: ['alertname', 'namespace']
group_wait: 30s
group_interval: 5m
repeat_interval: 12h
receiver: 'default'
receivers:
- name: 'default'
# Configure your notification channels hereSlack:
receivers:
- name: 'slack-notifications'
slack_configs:
- api_url: 'https://hooks.slack.com/services/xxx/yyy/zzz'
channel: '#alerts'
send_resolved: trueEmail:
receivers:
- name: 'email-notifications'
email_configs:
- to: 'alerts@example.com'
from: 'alertmanager@example.com'
smarthost: 'smtp.example.com:587'
auth_username: 'alertmanager'
auth_password: 'password'PagerDuty:
receivers:
- name: 'pagerduty'
pagerduty_configs:
- service_key: 'your-service-key'
send_resolved: trueFor Prometheus to scrape your application metrics, add annotations to your pods:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
template:
metadata:
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9090"
prometheus.io/path: "/metrics"
spec:
containers:
- name: app
ports:
- containerPort: 9090
name: metricsFor PHP applications, use a Prometheus client library:
// composer require promphp/prometheus_client_php
use Prometheus\CollectorRegistry;
use Prometheus\RenderTextFormat;
$registry = new CollectorRegistry(new InMemory());
// Create metrics
$counter = $registry->getOrRegisterCounter('app', 'http_requests_total', 'Total HTTP requests', ['method', 'endpoint']);
$counter->incBy(1, ['GET', '/api/users']);
// Expose /metrics endpoint
$renderer = new RenderTextFormat();
echo $renderer->render($registry->getMetricFamilySamples());For WordPress, use a metrics plugin or add to your custom theme:
// In functions.php or a custom plugin
add_action('init', function() {
if ($_SERVER['REQUEST_URI'] === '/metrics') {
header('Content-Type: text/plain');
echo "# WordPress metrics\n";
echo "wordpress_posts_total " . wp_count_posts()->publish . "\n";
echo "wordpress_users_total " . count_users()['total_users'] . "\n";
exit;
}
});# Check Prometheus targets
kubectl port-forward -n monitoring svc/prometheus-server 9090:80
# Visit http://localhost:9090/targetsCommon issues:
- Missing
prometheus.io/scrape: "true"annotation - Wrong port in
prometheus.io/port - Network policy blocking scrape
- Check data source configuration
- Verify Prometheus is running
- Check time range selector
- Test query in Prometheus UI first
# Check Alertmanager status
kubectl logs -n monitoring deployment/alertmanager
# Verify configuration
kubectl get configmap -n monitoring alertmanager-config -o yaml# Check monitoring pod resources
kubectl top pods -n monitoring
# Reduce retention if needed (in Prometheus config)
# --storage.tsdb.retention.time=7d# Grafana
kubectl port-forward -n monitoring svc/grafana 3000:80
# Prometheus
kubectl port-forward -n monitoring svc/prometheus-server 9090:80
# Alertmanager
kubectl port-forward -n monitoring svc/alertmanager 9093:80# All monitoring pods
kubectl get pods -n monitoring
# Monitoring services
kubectl get svc -n monitoring
# Prometheus targets (via API)
kubectl port-forward -n monitoring svc/prometheus-server 9090:80 &
curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets | length'# Restart Prometheus
kubectl rollout restart deployment/prometheus-server -n monitoring
# Restart Grafana
kubectl rollout restart deployment/grafana -n monitoring
# Restart Alertmanager
kubectl rollout restart deployment/alertmanager -n monitoring- Getting Started - Initial setup including monitoring
- Cloud Providers - Cluster creation with monitoring
- Troubleshooting - General troubleshooting guide
- Local Providers - Local development monitoring