Skip to content

Monitoring

Eric Mann edited this page Jan 7, 2026 · 1 revision

Monitoring and Observability

Displace automatically installs a complete monitoring stack on every cluster, providing metrics collection, visualization, and alerting out of the box.

Overview

flowchart TB
    subgraph cluster["Kubernetes Cluster"]
        subgraph apps["Your Applications"]
            app1["App Pod 1"]
            app2["App Pod 2"]
        end

        subgraph monitoring["Monitoring Stack"]
            prom["Prometheus<br/>Metrics Collection"]
            grafana["Grafana<br/>Visualization"]
            alert["Alertmanager<br/>Alert Routing"]
        end

        subgraph system["System Components"]
            node["Node Exporter"]
            kube["kube-state-metrics"]
        end
    end

    app1 -->|"metrics"| prom
    app2 -->|"metrics"| prom
    node -->|"node metrics"| prom
    kube -->|"k8s metrics"| prom
    prom -->|"queries"| grafana
    prom -->|"alerts"| alert
Loading

What's Included:

  • Prometheus - Time-series metrics database and alerting
  • Grafana - Visualization dashboards and exploration
  • Alertmanager - Alert routing and notification management
  • Node Exporter - Host-level metrics (CPU, memory, disk)
  • kube-state-metrics - Kubernetes object metrics

Monitoring Installation

Monitoring is installed automatically during cluster creation and bootstrapping.

During Cluster Creation

# Monitoring is enabled by default
displace cluster create production --provider aws

# Explicitly enable (same as default)
displace cluster create production --provider aws --monitoring

# Disable monitoring (not recommended)
displace cluster create production --provider aws --monitoring=false

During Bootstrap

# Bootstrap includes monitoring by default
displace cluster bootstrap production --provider aws

# Disable monitoring during bootstrap
displace cluster bootstrap production --provider aws --monitoring=false

Local Development

# displace install sets up monitoring on local cluster
displace install

Expected output:

Installing cluster components...
  ✓ Ingress controller (nginx)
  ✓ Monitoring stack (Prometheus, Grafana, Alertmanager)
  ✓ Cloudflare Tunnel daemon

Accessing Grafana

Grafana provides the primary interface for viewing metrics and dashboards.

Port Forwarding (Recommended for Local Access)

# Forward Grafana to localhost:3000
kubectl port-forward -n monitoring svc/grafana 3000:80

Open your browser: http://localhost:3000

Default Credentials

Field Value
Username admin
Password admin

Note: You'll be prompted to change the password on first login in production environments.

Via Ingress (Production)

If you have Ingress configured with a domain:

# Example Ingress for Grafana
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: grafana-ingress
  namespace: monitoring
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - grafana.example.com
    secretName: grafana-tls
  rules:
  - host: grafana.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: grafana
            port:
              number: 80

Accessing Prometheus

Prometheus provides the metrics backend and query interface.

Port Forwarding

# Forward Prometheus to localhost:9090
kubectl port-forward -n monitoring svc/prometheus-server 9090:80

Open your browser: http://localhost:9090

Prometheus UI Features

  • Graph - Execute PromQL queries and visualize results
  • Alerts - View configured alerting rules
  • Status - Check targets, configuration, and runtime info
  • Targets - See all scrape targets and their status

Accessing Alertmanager

Alertmanager handles alert routing and notifications.

Port Forwarding

# Forward Alertmanager to localhost:9093
kubectl port-forward -n monitoring svc/alertmanager 9093:80

Open your browser: http://localhost:9093

Alertmanager Features

  • View active alerts
  • Silence alerts temporarily
  • Configure notification routing
  • Group related alerts

Pre-installed Dashboards

Grafana comes with several pre-configured dashboards:

Kubernetes Dashboards

Dashboard Description
Kubernetes Cluster Overall cluster health and resource usage
Kubernetes Nodes Per-node CPU, memory, disk, network
Kubernetes Pods Pod-level metrics and status
Kubernetes Deployments Deployment health and scaling
Kubernetes Namespaces Per-namespace resource consumption

System Dashboards

Dashboard Description
Node Exporter Full Detailed host metrics
CoreDNS DNS query metrics
NGINX Ingress Ingress controller metrics

Accessing Dashboards

  1. Open Grafana (http://localhost:3000)
  2. Click Dashboards in the left sidebar
  3. Click Browse
  4. Select a dashboard from the list

Common Metrics Queries

PromQL Basics

Prometheus uses PromQL (Prometheus Query Language) for querying metrics.

Basic syntax:

metric_name{label="value"}

Useful Queries

Cluster Resource Usage

# Total cluster CPU usage (%)
100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# Total cluster memory usage (%)
(1 - (sum(node_memory_MemAvailable_bytes) / sum(node_memory_MemTotal_bytes))) * 100

# Total cluster disk usage (%)
(1 - (sum(node_filesystem_avail_bytes{mountpoint="/"}) / sum(node_filesystem_size_bytes{mountpoint="/"}))) * 100

Pod Metrics

# CPU usage by pod
sum(rate(container_cpu_usage_seconds_total{namespace="my-namespace"}[5m])) by (pod)

# Memory usage by pod
sum(container_memory_usage_bytes{namespace="my-namespace"}) by (pod)

# Pod restart count
sum(kube_pod_container_status_restarts_total{namespace="my-namespace"}) by (pod)

Application Metrics

# HTTP request rate (if your app exports metrics)
sum(rate(http_requests_total{namespace="my-namespace"}[5m])) by (service)

# HTTP error rate
sum(rate(http_requests_total{namespace="my-namespace",status=~"5.."}[5m])) by (service)

# Request latency (p99)
histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))

Creating Custom Dashboards

Step 1: Create a New Dashboard

  1. Open Grafana
  2. Click +Dashboard
  3. Click Add visualization

Step 2: Configure a Panel

  1. Select Prometheus as the data source
  2. Enter your PromQL query
  3. Choose visualization type (Graph, Stat, Gauge, etc.)
  4. Configure display options
  5. Click Apply

Step 3: Save the Dashboard

  1. Click Save dashboard (disk icon)
  2. Enter a name
  3. Choose a folder
  4. Click Save

Example: Application Dashboard

Create a dashboard with these panels:

Panel Query Visualization
Request Rate sum(rate(http_requests_total[5m])) Graph
Error Rate sum(rate(http_requests_total{status=~"5.."}[5m])) Graph
Latency P99 histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m])) Stat
Active Pods count(kube_pod_status_phase{phase="Running"}) Stat

Setting Up Alerts

Prometheus Alerting Rules

Alerting rules are defined in Prometheus and sent to Alertmanager.

Example alert rule:

groups:
- name: application-alerts
  rules:
  - alert: HighErrorRate
    expr: sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) > 0.05
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "High error rate detected"
      description: "Error rate is above 5% for more than 5 minutes"

  - alert: PodCrashLooping
    expr: rate(kube_pod_container_status_restarts_total[15m]) > 0
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Pod is crash looping"
      description: "Pod {{ $labels.pod }} is restarting frequently"

Grafana Alerts

Grafana can also create alerts based on dashboard panels:

  1. Edit a panel
  2. Click Alert tab
  3. Click Create alert rule from this panel
  4. Configure conditions and notifications
  5. Save

Alertmanager Configuration

Default Configuration

Alertmanager routes alerts based on labels and sends notifications.

route:
  group_by: ['alertname', 'namespace']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 12h
  receiver: 'default'

receivers:
- name: 'default'
  # Configure your notification channels here

Adding Notification Channels

Slack:

receivers:
- name: 'slack-notifications'
  slack_configs:
  - api_url: 'https://hooks.slack.com/services/xxx/yyy/zzz'
    channel: '#alerts'
    send_resolved: true

Email:

receivers:
- name: 'email-notifications'
  email_configs:
  - to: 'alerts@example.com'
    from: 'alertmanager@example.com'
    smarthost: 'smtp.example.com:587'
    auth_username: 'alertmanager'
    auth_password: 'password'

PagerDuty:

receivers:
- name: 'pagerduty'
  pagerduty_configs:
  - service_key: 'your-service-key'
    send_resolved: true

Monitoring Your Applications

Exposing Application Metrics

For Prometheus to scrape your application metrics, add annotations to your pods:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  template:
    metadata:
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "9090"
        prometheus.io/path: "/metrics"
    spec:
      containers:
      - name: app
        ports:
        - containerPort: 9090
          name: metrics

PHP/Laravel Metrics

For PHP applications, use a Prometheus client library:

// composer require promphp/prometheus_client_php

use Prometheus\CollectorRegistry;
use Prometheus\RenderTextFormat;

$registry = new CollectorRegistry(new InMemory());

// Create metrics
$counter = $registry->getOrRegisterCounter('app', 'http_requests_total', 'Total HTTP requests', ['method', 'endpoint']);
$counter->incBy(1, ['GET', '/api/users']);

// Expose /metrics endpoint
$renderer = new RenderTextFormat();
echo $renderer->render($registry->getMetricFamilySamples());

WordPress Metrics

For WordPress, use a metrics plugin or add to your custom theme:

// In functions.php or a custom plugin
add_action('init', function() {
    if ($_SERVER['REQUEST_URI'] === '/metrics') {
        header('Content-Type: text/plain');
        echo "# WordPress metrics\n";
        echo "wordpress_posts_total " . wp_count_posts()->publish . "\n";
        echo "wordpress_users_total " . count_users()['total_users'] . "\n";
        exit;
    }
});

Troubleshooting

Prometheus Not Scraping Targets

# Check Prometheus targets
kubectl port-forward -n monitoring svc/prometheus-server 9090:80
# Visit http://localhost:9090/targets

Common issues:

  • Missing prometheus.io/scrape: "true" annotation
  • Wrong port in prometheus.io/port
  • Network policy blocking scrape

Grafana Dashboard Empty

  1. Check data source configuration
  2. Verify Prometheus is running
  3. Check time range selector
  4. Test query in Prometheus UI first

Alertmanager Not Sending Notifications

# Check Alertmanager status
kubectl logs -n monitoring deployment/alertmanager

# Verify configuration
kubectl get configmap -n monitoring alertmanager-config -o yaml

High Resource Usage

# Check monitoring pod resources
kubectl top pods -n monitoring

# Reduce retention if needed (in Prometheus config)
# --storage.tsdb.retention.time=7d

Quick Reference

Port Forwarding Commands

# Grafana
kubectl port-forward -n monitoring svc/grafana 3000:80

# Prometheus
kubectl port-forward -n monitoring svc/prometheus-server 9090:80

# Alertmanager
kubectl port-forward -n monitoring svc/alertmanager 9093:80

Check Monitoring Status

# All monitoring pods
kubectl get pods -n monitoring

# Monitoring services
kubectl get svc -n monitoring

# Prometheus targets (via API)
kubectl port-forward -n monitoring svc/prometheus-server 9090:80 &
curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets | length'

Restart Monitoring Components

# Restart Prometheus
kubectl rollout restart deployment/prometheus-server -n monitoring

# Restart Grafana
kubectl rollout restart deployment/grafana -n monitoring

# Restart Alertmanager
kubectl rollout restart deployment/alertmanager -n monitoring

Related Documentation

Clone this wiki locally