Monitoring

Monitoring and Observability

Displace automatically installs a complete monitoring stack on every cluster, providing metrics collection, visualization, and alerting out of the box.

Overview

flowchart TB
    subgraph cluster["Kubernetes Cluster"]
        subgraph apps["Your Applications"]
            app1["App Pod 1"]
            app2["App Pod 2"]
        end

        subgraph monitoring["Monitoring Stack"]
            prom["Prometheus<br/>Metrics Collection"]
            grafana["Grafana<br/>Visualization"]
            alert["Alertmanager<br/>Alert Routing"]
        end

        subgraph system["System Components"]
            node["Node Exporter"]
            kube["kube-state-metrics"]
        end
    end

    app1 -->|"metrics"| prom
    app2 -->|"metrics"| prom
    node -->|"node metrics"| prom
    kube -->|"k8s metrics"| prom
    prom -->|"queries"| grafana
    prom -->|"alerts"| alert

What's Included:

Prometheus - Time-series metrics database and alerting
Grafana - Visualization dashboards and exploration
Alertmanager - Alert routing and notification management
Node Exporter - Host-level metrics (CPU, memory, disk)
kube-state-metrics - Kubernetes object metrics

Monitoring Installation

Monitoring is installed automatically during cluster creation and bootstrapping.

During Cluster Creation

# Monitoring is enabled by default
displace cluster create production --provider aws

# Explicitly enable (same as default)
displace cluster create production --provider aws --monitoring

# Disable monitoring (not recommended)
displace cluster create production --provider aws --monitoring=false

During Bootstrap

# Bootstrap includes monitoring by default
displace cluster bootstrap production --provider aws

# Disable monitoring during bootstrap
displace cluster bootstrap production --provider aws --monitoring=false

Local Development

# displace install sets up monitoring on local cluster
displace install

Expected output:

Installing cluster components...
  ✓ Ingress controller (nginx)
  ✓ Monitoring stack (Prometheus, Grafana, Alertmanager)
  ✓ Cloudflare Tunnel daemon

Accessing Grafana

Grafana provides the primary interface for viewing metrics and dashboards.

Port Forwarding (Recommended for Local Access)

# Forward Grafana to localhost:3000
kubectl port-forward -n monitoring svc/grafana 3000:80

Open your browser: http://localhost:3000

Default Credentials

Field	Value
Username	`admin`
Password	`admin`

Note: You'll be prompted to change the password on first login in production environments.

Via Ingress (Production)

If you have Ingress configured with a domain:

# Example Ingress for Grafana
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: grafana-ingress
  namespace: monitoring
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - grafana.example.com
    secretName: grafana-tls
  rules:
  - host: grafana.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: grafana
            port:
              number: 80

Accessing Prometheus

Prometheus provides the metrics backend and query interface.

Port Forwarding

# Forward Prometheus to localhost:9090
kubectl port-forward -n monitoring svc/prometheus-server 9090:80

Open your browser: http://localhost:9090

Prometheus UI Features

Graph - Execute PromQL queries and visualize results
Alerts - View configured alerting rules
Status - Check targets, configuration, and runtime info
Targets - See all scrape targets and their status

Accessing Alertmanager

Alertmanager handles alert routing and notifications.

Port Forwarding

# Forward Alertmanager to localhost:9093
kubectl port-forward -n monitoring svc/alertmanager 9093:80

Open your browser: http://localhost:9093

Alertmanager Features

View active alerts
Silence alerts temporarily
Configure notification routing
Group related alerts

Pre-installed Dashboards

Grafana comes with several pre-configured dashboards:

Kubernetes Dashboards

Dashboard	Description
Kubernetes Cluster	Overall cluster health and resource usage
Kubernetes Nodes	Per-node CPU, memory, disk, network
Kubernetes Pods	Pod-level metrics and status
Kubernetes Deployments	Deployment health and scaling
Kubernetes Namespaces	Per-namespace resource consumption

System Dashboards

Dashboard	Description
Node Exporter Full	Detailed host metrics
CoreDNS	DNS query metrics
NGINX Ingress	Ingress controller metrics

Accessing Dashboards

Open Grafana (http://localhost:3000)
Click Dashboards in the left sidebar
Click Browse
Select a dashboard from the list

Common Metrics Queries

PromQL Basics

Prometheus uses PromQL (Prometheus Query Language) for querying metrics.

Basic syntax:

metric_name{label="value"}

Useful Queries

Cluster Resource Usage

# Total cluster CPU usage (%)
100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# Total cluster memory usage (%)
(1 - (sum(node_memory_MemAvailable_bytes) / sum(node_memory_MemTotal_bytes))) * 100

# Total cluster disk usage (%)
(1 - (sum(node_filesystem_avail_bytes{mountpoint="/"}) / sum(node_filesystem_size_bytes{mountpoint="/"}))) * 100

Pod Metrics

# CPU usage by pod
sum(rate(container_cpu_usage_seconds_total{namespace="my-namespace"}[5m])) by (pod)

# Memory usage by pod
sum(container_memory_usage_bytes{namespace="my-namespace"}) by (pod)

# Pod restart count
sum(kube_pod_container_status_restarts_total{namespace="my-namespace"}) by (pod)

Application Metrics

# HTTP request rate (if your app exports metrics)
sum(rate(http_requests_total{namespace="my-namespace"}[5m])) by (service)

# HTTP error rate
sum(rate(http_requests_total{namespace="my-namespace",status=~"5.."}[5m])) by (service)

# Request latency (p99)
histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))

Creating Custom Dashboards

Step 1: Create a New Dashboard

Open Grafana
Click + → Dashboard
Click Add visualization

Step 2: Configure a Panel

Select Prometheus as the data source
Enter your PromQL query
Choose visualization type (Graph, Stat, Gauge, etc.)
Configure display options
Click Apply

Step 3: Save the Dashboard

Click Save dashboard (disk icon)
Enter a name
Choose a folder
Click Save

Example: Application Dashboard

Create a dashboard with these panels:

Panel	Query	Visualization
Request Rate	`sum(rate(http_requests_total[5m]))`	Graph
Error Rate	`sum(rate(http_requests_total{status=~"5.."}[5m]))`	Graph
Latency P99	`histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))`	Stat
Active Pods	`count(kube_pod_status_phase{phase="Running"})`	Stat

Setting Up Alerts

Prometheus Alerting Rules

Alerting rules are defined in Prometheus and sent to Alertmanager.

Example alert rule:

groups:
- name: application-alerts
  rules:
  - alert: HighErrorRate
    expr: sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) > 0.05
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "High error rate detected"
      description: "Error rate is above 5% for more than 5 minutes"

  - alert: PodCrashLooping
    expr: rate(kube_pod_container_status_restarts_total[15m]) > 0
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Pod is crash looping"
      description: "Pod {{ $labels.pod }} is restarting frequently"

Grafana Alerts

Grafana can also create alerts based on dashboard panels:

Edit a panel
Click Alert tab
Click Create alert rule from this panel
Configure conditions and notifications
Save

Alertmanager Configuration

Default Configuration

Alertmanager routes alerts based on labels and sends notifications.

route:
  group_by: ['alertname', 'namespace']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 12h
  receiver: 'default'

receivers:
- name: 'default'
  # Configure your notification channels here

Adding Notification Channels

Slack:

receivers:
- name: 'slack-notifications'
  slack_configs:
  - api_url: 'https://hooks.slack.com/services/xxx/yyy/zzz'
    channel: '#alerts'
    send_resolved: true

Email:

receivers:
- name: 'email-notifications'
  email_configs:
  - to: 'alerts@example.com'
    from: 'alertmanager@example.com'
    smarthost: 'smtp.example.com:587'
    auth_username: 'alertmanager'
    auth_password: 'password'

PagerDuty:

receivers:
- name: 'pagerduty'
  pagerduty_configs:
  - service_key: 'your-service-key'
    send_resolved: true

Monitoring Your Applications

Exposing Application Metrics

For Prometheus to scrape your application metrics, add annotations to your pods:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  template:
    metadata:
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "9090"
        prometheus.io/path: "/metrics"
    spec:
      containers:
      - name: app
        ports:
        - containerPort: 9090
          name: metrics

PHP/Laravel Metrics

For PHP applications, use a Prometheus client library:

// composer require promphp/prometheus_client_php

use Prometheus\CollectorRegistry;
use Prometheus\RenderTextFormat;

$registry = new CollectorRegistry(new InMemory());

// Create metrics
$counter = $registry->getOrRegisterCounter('app', 'http_requests_total', 'Total HTTP requests', ['method', 'endpoint']);
$counter->incBy(1, ['GET', '/api/users']);

// Expose /metrics endpoint
$renderer = new RenderTextFormat();
echo $renderer->render($registry->getMetricFamilySamples());

WordPress Metrics

For WordPress, use a metrics plugin or add to your custom theme:

// In functions.php or a custom plugin
add_action('init', function() {
    if ($_SERVER['REQUEST_URI'] === '/metrics') {
        header('Content-Type: text/plain');
        echo "# WordPress metrics\n";
        echo "wordpress_posts_total " . wp_count_posts()->publish . "\n";
        echo "wordpress_users_total " . count_users()['total_users'] . "\n";
        exit;
    }
});

Troubleshooting

Prometheus Not Scraping Targets

# Check Prometheus targets
kubectl port-forward -n monitoring svc/prometheus-server 9090:80
# Visit http://localhost:9090/targets

Common issues:

Missing prometheus.io/scrape: "true" annotation
Wrong port in prometheus.io/port
Network policy blocking scrape

Grafana Dashboard Empty

Check data source configuration
Verify Prometheus is running
Check time range selector
Test query in Prometheus UI first

Alertmanager Not Sending Notifications

# Check Alertmanager status
kubectl logs -n monitoring deployment/alertmanager

# Verify configuration
kubectl get configmap -n monitoring alertmanager-config -o yaml

High Resource Usage

# Check monitoring pod resources
kubectl top pods -n monitoring

# Reduce retention if needed (in Prometheus config)
# --storage.tsdb.retention.time=7d

Quick Reference

Port Forwarding Commands

# Grafana
kubectl port-forward -n monitoring svc/grafana 3000:80

# Prometheus
kubectl port-forward -n monitoring svc/prometheus-server 9090:80

# Alertmanager
kubectl port-forward -n monitoring svc/alertmanager 9093:80

Check Monitoring Status

# All monitoring pods
kubectl get pods -n monitoring

# Monitoring services
kubectl get svc -n monitoring

# Prometheus targets (via API)
kubectl port-forward -n monitoring svc/prometheus-server 9090:80 &
curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets | length'

Restart Monitoring Components

# Restart Prometheus
kubectl rollout restart deployment/prometheus-server -n monitoring

# Restart Grafana
kubectl rollout restart deployment/grafana -n monitoring

# Restart Alertmanager
kubectl rollout restart deployment/alertmanager -n monitoring

Monitoring

Monitoring and Observability

Overview

Monitoring Installation

During Cluster Creation

During Bootstrap

Local Development

Accessing Grafana

Port Forwarding (Recommended for Local Access)

Default Credentials

Via Ingress (Production)

Accessing Prometheus

Port Forwarding

Prometheus UI Features

Accessing Alertmanager

Port Forwarding

Alertmanager Features

Pre-installed Dashboards

Kubernetes Dashboards

System Dashboards

Accessing Dashboards

Common Metrics Queries

PromQL Basics

Useful Queries

Cluster Resource Usage

Pod Metrics

Application Metrics

Creating Custom Dashboards

Step 1: Create a New Dashboard

Step 2: Configure a Panel

Step 3: Save the Dashboard

Example: Application Dashboard

Setting Up Alerts

Prometheus Alerting Rules

Grafana Alerts

Alertmanager Configuration

Default Configuration

Adding Notification Channels

Monitoring Your Applications

Exposing Application Metrics

PHP/Laravel Metrics

WordPress Metrics

Troubleshooting

Prometheus Not Scraping Targets

Grafana Dashboard Empty

Alertmanager Not Sending Notifications

High Resource Usage

Quick Reference

Port Forwarding Commands

Check Monitoring Status

Restart Monitoring Components

Related Documentation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Getting Started

Providers