Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .github/workflows/release_images.yml
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,13 @@ jobs:
echo "Updated values.yaml content:"
cat operator/documentdb-helm-chart/values.yaml

- name: Inject telemetry connection string
if: ${{ secrets.APPINSIGHTS_CONNECTION_STRING != '' }}
run: |
echo "Injecting Application Insights connection string for telemetry"
# Use yq to update the connectionString field in values.yaml
sed -i 's|connectionString: ""|connectionString: "${{ secrets.APPINSIGHTS_CONNECTION_STRING }}"|g' operator/documentdb-helm-chart/values.yaml

- name: Set chart version
run: |
echo "CHART_VERSION=${{ github.event.inputs.version }}" >> $GITHUB_ENV
Expand Down
2 changes: 1 addition & 1 deletion docs/designs/appinsights-metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ This document specifies all telemetry data points to be collected by Application
- **Metric**: `operator.health.status`
- **Value**: `1` (healthy) or `0` (unhealthy)
- **Frequency**: Every 60 seconds
- **Dimensions**: `pod_name`, `namespace`
- **Dimensions**: `pod_name`, `namespace_hash`

---

Expand Down
134 changes: 134 additions & 0 deletions docs/designs/telemetry-configuration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
# Application Insights Telemetry Configuration

This document describes how to configure Application Insights telemetry collection for the DocumentDB Kubernetes Operator.

## Overview

The DocumentDB Operator can send telemetry data to Azure Application Insights to help monitor operator health, track cluster lifecycle events, and diagnose issues. All telemetry is designed with privacy in mind - no personally identifiable information (PII) is collected.

## Configuration

### Environment Variables

Configure telemetry by setting these environment variables in the operator deployment:

| Variable | Description | Required |
|----------|-------------|----------|
| `APPINSIGHTS_INSTRUMENTATIONKEY` | Application Insights instrumentation key | Yes (or connection string) |
| `APPLICATIONINSIGHTS_CONNECTION_STRING` | Application Insights connection string (alternative to instrumentation key) | Yes (or instrumentation key) |
| `DOCUMENTDB_TELEMETRY_ENABLED` | Set to `false` to disable telemetry collection | No (default: `true`) |

### Helm Chart Configuration

When installing via Helm, you can configure telemetry in your values.yaml:

```yaml
# values.yaml
telemetry:
enabled: true
instrumentationKey: "YOUR-INSTRUMENTATION-KEY-HERE"
# Or use connection string:
# connectionString: "InstrumentationKey=xxx;IngestionEndpoint=https://..."
# Or use an existing secret containing APPINSIGHTS_INSTRUMENTATIONKEY / APPLICATIONINSIGHTS_CONNECTION_STRING:
# existingSecret: "documentdb-operator-telemetry"
```

### Kubernetes Secret

For production deployments, store the instrumentation key in a Kubernetes secret:

```yaml
apiVersion: v1
kind: Secret
metadata:
name: documentdb-operator-telemetry
namespace: documentdb-system
type: Opaque
stringData:
APPINSIGHTS_INSTRUMENTATIONKEY: "YOUR-INSTRUMENTATION-KEY-HERE"
```

Then reference it in the operator deployment:

```yaml
envFrom:
- secretRef:
name: documentdb-operator-telemetry
```

## Privacy & Data Collection

### What We Collect

The operator collects anonymous, aggregated telemetry data including:

- **Operator lifecycle**: Startup events, health status, version information
- **Cluster operations**: Create, update, delete events (with timing metrics)
- **Backup operations**: Backup creation, completion, and expiration events
- **Error tracking**: Categorized errors (no raw error messages with sensitive data)
- **Performance metrics**: Reconciliation duration, API call latency

### What We DON'T Collect

To protect your privacy, we explicitly do NOT collect:

- Cluster names, namespace names, or any user-provided resource names
- Connection strings, passwords, or credentials
- IP addresses or hostnames
- Storage class names (may contain organizational information)
- Raw error messages (only categorized error types)
- Container image names

### Privacy Protection Mechanisms

1. **GUIDs Instead of Names**: All resources are identified by auto-generated GUIDs stored in annotations (`telemetry.documentdb.io/cluster-id`)
2. **Hashed Namespaces**: Namespace names are SHA-256 hashed before transmission
3. **Categorized Data**: Values like PVC sizes are categorized (small/medium/large) instead of exact values
4. **Error Sanitization**: Error messages are stripped of potential PII and truncated

## Disabling Telemetry

To completely disable telemetry collection:

1. **Via environment variable**:
```yaml
env:
- name: DOCUMENTDB_TELEMETRY_ENABLED
value: "false"
```

2. **Via Helm**:
```yaml
telemetry:
enabled: false
```

3. **Don't provide instrumentation key**: If no `APPINSIGHTS_INSTRUMENTATIONKEY` or `APPLICATIONINSIGHTS_CONNECTION_STRING` is set, telemetry is automatically disabled.

## Telemetry Events Reference

See [appinsights-metrics.md](appinsights-metrics.md) for the complete specification of all telemetry events and metrics collected.

## Troubleshooting

### Telemetry Not Being Sent

1. Verify the instrumentation key is correctly configured:
```bash
kubectl get deployment documentdb-operator -n documentdb-system -o yaml | grep -A5 APPINSIGHTS
```

2. Check operator logs for telemetry initialization:
```bash
kubectl logs -n documentdb-system -l app=documentdb-operator | grep -i telemetry
```

3. Verify network connectivity to Application Insights endpoint (`dc.services.visualstudio.com`)

### High Cardinality Warnings

If you see warnings about high cardinality dimensions, this indicates too many unique values for a dimension. The telemetry system automatically samples high-frequency events to mitigate this.

## Support

For issues related to telemetry collection, please open an issue on the [GitHub repository](https://github.com/documentdb/documentdb-kubernetes-operator/issues).
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,10 @@ rules:
- apiGroups: ["snapshot.storage.k8s.io"]
resources: ["volumesnapshotclasses"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
# Node read permissions for telemetry cloud provider detection
- apiGroups: [""]
resources: ["nodes"]
verbs: ["get", "list"]
# PersistentVolume permissions for PV controller
- apiGroups: [""]
resources: ["persistentvolumes"]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,30 @@ spec:
env:
- name: GATEWAY_PORT
value: "10260"
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
{{- if .Values.documentDbVersion | default .Chart.AppVersion }}
- name: DOCUMENTDB_VERSION
value: "{{ .Values.documentDbVersion | default .Chart.AppVersion }}"
{{- end }}
# Telemetry configuration
{{- if not .Values.telemetry.enabled }}
- name: DOCUMENTDB_TELEMETRY_ENABLED
value: "false"
{{- else }}
{{- if .Values.telemetry.existingSecret }}
envFrom:
- secretRef:
name: {{ .Values.telemetry.existingSecret }}
{{- else if .Values.telemetry.connectionString }}
Comment on lines +40 to +45
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

envFrom is being rendered inside the env: list when telemetry.existingSecret is set, which produces an invalid container spec (and can break YAML/schema validation). envFrom must be a sibling of env, not an item within it; consider rendering envFrom at the container level and keeping env: as a pure list of - name: entries.

Copilot uses AI. Check for mistakes.
- name: APPLICATIONINSIGHTS_CONNECTION_STRING
value: {{ .Values.telemetry.connectionString | quote }}
{{- else if .Values.telemetry.instrumentationKey }}
- name: APPINSIGHTS_INSTRUMENTATIONKEY
value: {{ .Values.telemetry.instrumentationKey | quote }}
{{- end }}
{{- if .Values.gatewayImagePullPolicy }}
- name: GATEWAY_IMAGE_PULL_POLICY
value: "{{ .Values.gatewayImagePullPolicy }}"
Expand Down
11 changes: 11 additions & 0 deletions operator/documentdb-helm-chart/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,17 @@ replicaCount: 1
# Defaults to Chart.appVersion if not specified
documentDbVersion: ""

# Telemetry configuration for Application Insights
telemetry:
# Enable or disable telemetry collection
enabled: true
# Application Insights instrumentation key (provide either this or connectionString)
instrumentationKey: ""
# Application Insights connection string (alternative to instrumentationKey)
connectionString: ""
# Name of existing secret containing telemetry credentials
# Secret should have keys: APPINSIGHTS_INSTRUMENTATIONKEY or APPLICATIONINSIGHTS_CONNECTION_STRING
existingSecret: ""
# Gateway image pull policy for the gateway sidecar container.
# Valid values: Always, IfNotPresent, Never. Defaults to IfNotPresent if not set.
gatewayImagePullPolicy: ""
Expand Down
51 changes: 41 additions & 10 deletions operator/src/cmd/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
package main

import (
"context"
"crypto/tls"
"flag"
"os"
Expand All @@ -29,10 +30,17 @@ import (
cnpgv1 "github.com/cloudnative-pg/cloudnative-pg/api/v1"
dbpreview "github.com/documentdb/documentdb-operator/api/preview"
"github.com/documentdb/documentdb-operator/internal/controller"
"github.com/documentdb/documentdb-operator/internal/telemetry"
fleetv1alpha1 "go.goms.io/fleet-networking/api/v1alpha1"
// +kubebuilder:scaffold:imports
)

// Version information - set via ldflags at build time
var (
version = "dev"
helmChartVersion = ""
)

var (
scheme = runtime.NewScheme()
setupLog = ctrl.Log.WithName("setup")
Expand Down Expand Up @@ -211,29 +219,52 @@ func main() {
os.Exit(1)
}

// Initialize telemetry
telemetryMgr, err := telemetry.NewManager(
context.Background(),
telemetry.ManagerConfig{
OperatorVersion: version,
HelmChartVersion: helmChartVersion,
Logger: setupLog,
},
mgr.GetClient(),
clientset,
)
if err != nil {
setupLog.Error(err, "unable to initialize telemetry manager")
// Continue without telemetry - it's not critical
} else {
telemetryMgr.Start()
defer telemetryMgr.Stop()
setupLog.Info("Telemetry initialized", "enabled", telemetryMgr.IsEnabled())
}

if err = (&controller.DocumentDBReconciler{
Client: mgr.GetClient(),
Scheme: mgr.GetScheme(),
Config: mgr.GetConfig(),
Clientset: clientset,
Client: mgr.GetClient(),
Scheme: mgr.GetScheme(),
Config: mgr.GetConfig(),
Clientset: clientset,
TelemetryMgr: telemetryMgr,
}).SetupWithManager(mgr); err != nil {
setupLog.Error(err, "unable to create controller", "controller", "DocumentDB")
os.Exit(1)
}

if err = (&controller.BackupReconciler{
Client: mgr.GetClient(),
Scheme: mgr.GetScheme(),
Recorder: mgr.GetEventRecorderFor("backup-controller"),
Client: mgr.GetClient(),
Scheme: mgr.GetScheme(),
Recorder: mgr.GetEventRecorderFor("backup-controller"),
TelemetryMgr: telemetryMgr,
}).SetupWithManager(mgr); err != nil {
setupLog.Error(err, "unable to create controller", "controller", "Backup")
os.Exit(1)
}

if err = (&controller.ScheduledBackupReconciler{
Client: mgr.GetClient(),
Scheme: mgr.GetScheme(),
Recorder: mgr.GetEventRecorderFor("scheduled-backup-controller"),
Client: mgr.GetClient(),
Scheme: mgr.GetScheme(),
Recorder: mgr.GetEventRecorderFor("scheduled-backup-controller"),
TelemetryMgr: telemetryMgr,
}).SetupWithManager(mgr); err != nil {
setupLog.Error(err, "unable to create controller", "controller", "ScheduledBackup")
os.Exit(1)
Expand Down
12 changes: 9 additions & 3 deletions operator/src/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ require (
github.com/cloudnative-pg/cloudnative-pg v1.28.1
github.com/cloudnative-pg/machinery v0.3.3
github.com/go-logr/logr v1.4.3
github.com/google/uuid v1.6.0
github.com/microsoft/ApplicationInsights-Go v0.4.4
github.com/onsi/ginkgo/v2 v2.28.1
github.com/onsi/gomega v1.39.1
github.com/stretchr/testify v1.11.1
Expand All @@ -21,7 +23,14 @@ require (
)

require (
code.cloudfoundry.org/clock v0.0.0-20180518195852-02e53af36e6c // indirect
github.com/gofrs/uuid v3.3.0+incompatible // indirect
)

require (
cel.dev/expr v0.24.0 // indirect
github.com/Masterminds/semver/v3 v3.4.0 // indirect
github.com/antlr4-go/antlr/v4 v4.13.1 // indirect
github.com/cenkalti/backoff/v5 v5.0.3 // indirect
github.com/cloudnative-pg/cnpg-i v0.3.1 // indirect
github.com/go-openapi/swag/cmdutils v0.25.4 // indirect
Expand All @@ -45,8 +54,6 @@ require (
)

require (
cel.dev/expr v0.24.0 // indirect
github.com/antlr4-go/antlr/v4 v4.13.1 // indirect
github.com/beorn7/perks v1.0.1 // indirect
github.com/blang/semver/v4 v4.0.0 // indirect
github.com/cespare/xxhash/v2 v2.3.0 // indirect
Expand All @@ -68,7 +75,6 @@ require (
github.com/google/gnostic-models v0.7.1 // indirect
github.com/google/go-cmp v0.7.0 // indirect
github.com/google/pprof v0.0.0-20260115054156-294ebfa9ad83 // indirect
github.com/google/uuid v1.6.0 // indirect
github.com/gorilla/websocket v1.5.4-0.20250319132907-e064f32e3674 // indirect
github.com/grpc-ecosystem/grpc-gateway/v2 v2.27.1 // indirect
github.com/inconshreveable/mousetrap v1.1.0 // indirect
Expand Down
Loading
Loading