- Latency
- Traffic
- Errors
- Saturation
Alerting: notify when something is wrong Troubleshooting: help us to isolate and fix the problem Tuning/Capacity Planning: to assist us in improving our setup over time
In our case we will use those components on a docker compute:
- Thanos for the retention (and a Azure Storage account)
- Prometheus to orchestrate the supervision
- AlertManager for alerting
- Grafana for display metrics and logs
- Loki for parse our LOGs
- Telegraf for self monitoring
- Nginx to securise and expose our stack

