From 21ba781ce954c89d7d519a370615714b8fab43ce Mon Sep 17 00:00:00 2001 From: Wenting Wu Date: Fri, 6 Mar 2026 15:04:39 -0500 Subject: [PATCH 1/9] docs: add configuration documentation guides Add comprehensive configuration documentation for the DocumentDB Kubernetes Operator covering TLS, storage, networking, and resource management. New documentation pages: - configuration/tls.md TLS modes (Disabled/SelfSigned/CertManager/Provided), certificate rotation, Azure Key Vault integration, and troubleshooting - configuration/storage.md Storage classes by provider, PVC sizing, volume expansion, reclaim policies, benchmarking, and disk encryption (AKS/EKS/GKE) - configuration/networking.md Service types (ClusterIP/LoadBalancer), cloud-specific LB annotations, DNS configuration, and Network Policies - configuration/resource-management.md CPU/memory sizing, QoS classes, workload profiles (dev/prod/high-load), and monitoring recommendations - configuration/cluster-configuration.md Guided overview of all CRD fields with full spec YAML, field tables, and cross-references Documentation improvements: - Reorganize mkdocs.yml nav with Configuration section and sub-items - Add Material for MkDocs features (tabs, code copy, admonitions, annotations) - Cross-link all config guides to auto-generated API Reference - Remove redundant Backup/ScheduledBackup from cluster-configuration.md (covered by backup-and-restore.md and api-reference.md) - Slim down index.md Configuration section with links to new guides - Add .gitignore entry for MkDocs site/ build output - Update advanced-configuration/README.md with links to new config pages - Add FAQ entry pointing to API Reference Refs #248 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Wenting Wu --- .gitignore | 3 + .../preview/advanced-configuration/README.md | 385 +----------------- .../preview/backup-and-restore.md | 3 + .../preview/configuration/networking.md | 383 +++++++++++++++++ .../configuration/resource-management.md | 188 +++++++++ .../preview/configuration/storage.md | 320 +++++++++++++++ .../preview/configuration/tls.md | 338 +++++++++++++++ .../preview/faq.md | 4 + .../preview/index.md | 41 +- mkdocs.yml | 20 + 10 files changed, 1283 insertions(+), 402 deletions(-) create mode 100644 docs/operator-public-documentation/preview/configuration/networking.md create mode 100644 docs/operator-public-documentation/preview/configuration/resource-management.md create mode 100644 docs/operator-public-documentation/preview/configuration/storage.md create mode 100644 docs/operator-public-documentation/preview/configuration/tls.md diff --git a/.gitignore b/.gitignore index 96f83b7b..874f00cf 100644 --- a/.gitignore +++ b/.gitignore @@ -411,3 +411,6 @@ Chart.lock # Test output *.out .DS_Store + +# MkDocs build output +site/ diff --git a/docs/operator-public-documentation/preview/advanced-configuration/README.md b/docs/operator-public-documentation/preview/advanced-configuration/README.md index 845a738e..fe08f891 100644 --- a/docs/operator-public-documentation/preview/advanced-configuration/README.md +++ b/docs/operator-public-documentation/preview/advanced-configuration/README.md @@ -2,200 +2,20 @@ This section covers advanced configuration options for the DocumentDB Kubernetes Operator. +For core configuration topics, see the [Configuration](../configuration/tls.md) guides: + +- [API Reference](../api-reference.md) — CRD reference for DocumentDB, Backup, and ScheduledBackup +- [TLS](../configuration/tls.md) — TLS modes, certificate rotation, and troubleshooting +- [Storage](../configuration/storage.md) — Storage classes, PVC sizing, encryption +- [Networking](../configuration/networking.md) — Service types, load balancers, Network Policies +- [Resource Management](../configuration/resource-management.md) — CPU and memory sizing + ## Table of Contents -- [TLS Configuration](#tls-configuration) - [High Availability](#high-availability) -- [Storage Configuration](#storage-configuration) - [Scheduling](#scheduling) -- [Resource Management](#resource-management) - [Security](#security) -## TLS Configuration - -The operator supports three TLS modes for secure gateway connections, each suited to different operational requirements. - -### TLS Modes - -1. **SelfSigned** — Automatic certificate management using cert-manager with self-signed certificates - - Best for: Development, testing, and environments without external PKI - - Zero external dependencies - - Automatic certificate rotation - -2. **Provided** — Use certificates from Azure Key Vault via Secrets Store CSI driver - - Best for: Production environments with centralized certificate management - - Enterprise PKI integration - - Azure Key Vault integration - -3. **CertManager** — Use custom cert-manager issuers (for example, Let's Encrypt or a corporate CA) - - Best for: Production environments with existing cert-manager infrastructure - - Flexible issuer support - - Industry-standard certificates - -### Getting Started with TLS - -For comprehensive TLS setup and testing documentation, see: - -- **[Complete TLS Setup Guide](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/tls/README.md)** — Quick start with automated scripts, detailed configuration for each TLS mode, troubleshooting, and best practices -- **[E2E Testing Guide](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/tls/E2E-TESTING.md)** — Automated and manual testing, validation procedures, and CI/CD integration examples - -### Quick TLS Setup - -For the fastest TLS setup, use the automated script: - -```bash -cd documentdb-playground/tls/scripts - -# Complete E2E setup (AKS + DocumentDB + TLS) -./create-cluster.sh \ - --suffix mytest \ - --subscription-id -``` - -This command will: - -- Create an AKS cluster with all required addons -- Install cert-manager and the CSI driver -- Deploy the DocumentDB operator -- Configure and validate both SelfSigned and Provided TLS modes - -**Duration**: ~25–30 minutes - -### TLS Configuration Examples - -#### SelfSigned Mode - -SelfSigned mode requires no additional configuration beyond setting the mode: - -```yaml -apiVersion: documentdb.io/preview -kind: DocumentDB -metadata: - name: documentdb-selfsigned - namespace: default -spec: - nodeCount: 1 - instancesPerNode: 3 - resource: - storage: - pvcSize: 10Gi - tls: - gateway: - mode: SelfSigned -``` - -#### Provided Mode (Azure Key Vault) - -```yaml -apiVersion: documentdb.io/preview -kind: DocumentDB -metadata: - name: documentdb-provided - namespace: default -spec: - nodeCount: 1 - instancesPerNode: 3 - resource: - storage: - pvcSize: 10Gi - tls: - gateway: - mode: Provided - provided: - secretName: documentdb-tls-akv -``` - -#### CertManager Mode with a custom issuer - -```yaml -apiVersion: documentdb.io/preview -kind: DocumentDB -metadata: - name: documentdb-certmanager - namespace: default -spec: - nodeCount: 1 - instancesPerNode: 3 - resource: - storage: - pvcSize: 10Gi - tls: - gateway: - mode: CertManager - certManager: - issuerRef: - name: letsencrypt-prod - kind: ClusterIssuer - dnsNames: - - documentdb.example.com - - "*.documentdb.example.com" -``` - -### TLS Status and Monitoring - -Check the TLS status of your DocumentDB instance: - -```bash -kubectl get documentdb -n -o jsonpath='{.status.tls}' | jq -``` - -Example output: - -```json -{ - "ready": true, - "secretName": "documentdb-gateway-cert-tls", - "message": "" -} -``` - -### Certificate Rotation - -The operator handles certificate rotation automatically: - -- **SelfSigned and CertManager modes**: cert-manager rotates certificates before expiration -- **Provided mode**: Sync certificates from Azure Key Vault on rotation - -Monitor certificate expiration: - -```bash -# Check certificate expiration -kubectl get certificate -n -o jsonpath='{.status.notAfter}' - -# Inspect the TLS secret directly -kubectl get secret -n -o jsonpath='{.data.tls\.crt}' | \ - base64 -d | openssl x509 -noout -dates -``` - -### Troubleshooting TLS - -For comprehensive troubleshooting, see the [E2E Testing Guide](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/tls/E2E-TESTING.md#troubleshooting). - -Common issues: - -1. **Certificate not ready** — Check cert-manager logs and certificate status -2. **Connection failures** — Verify service endpoints and TLS handshake -3. **Azure Key Vault access denied** — Check managed identity and RBAC permissions - -Quick diagnostics: - -```bash -# Check DocumentDB TLS status -kubectl describe documentdb -n - -# Check certificate status -kubectl describe certificate -n - -# Check cert-manager logs -kubectl logs -n cert-manager deployment/cert-manager - -# Test TLS handshake -EXTERNAL_IP=$(kubectl get svc -n -o jsonpath='{.items[0].status.loadBalancer.ingress[0].ip}') -openssl s_client -connect $EXTERNAL_IP:10260 -``` - ---- - ## High Availability Deploy multiple instances for automatic failover and read scalability. @@ -227,138 +47,6 @@ spec: --- -## Storage Configuration - -Configure persistent storage for DocumentDB instances. - -### Storage Classes - -```yaml -spec: - resource: - storage: - pvcSize: 100Gi - storageClass: premium-ssd # Azure Premium SSD -``` - -### Volume Expansion - -```bash -# Ensure storage class allows volume expansion -kubectl get storageclass -o jsonpath='{.allowVolumeExpansion}' - -# Patch DocumentDB for larger storage -kubectl patch documentdb -n --type='json' \ - -p='[{"op": "replace", "path": "/spec/resource/storage/pvcSize", "value":"200Gi"}]' -``` - -### PersistentVolume Security - -The DocumentDB operator automatically applies security-hardening mount options to all PersistentVolumes associated with DocumentDB clusters: - -| Mount Option | Description | -|--------------|-------------| -| `nodev` | Prevents device files from being interpreted on the filesystem | -| `nosuid` | Prevents setuid/setgid bits from taking effect | -| `noexec` | Prevents execution of binaries on the filesystem | - -These options are automatically applied by the PV controller and require no additional configuration. - -### Disk Encryption - -Encryption at rest is essential for protecting sensitive database data. Here's how to configure disk encryption for each cloud provider: - -#### Azure Kubernetes Service (AKS) - -AKS encrypts all managed disks by default using Azure Storage Service Encryption (SSE) with platform-managed keys. No additional configuration is required. - -For customer-managed keys (CMK), use Azure Disk Encryption: - -```yaml -apiVersion: storage.k8s.io/v1 -kind: StorageClass -metadata: - name: managed-csi-encrypted -provisioner: disk.csi.azure.com -parameters: - skuName: Premium_LRS - # For customer-managed keys, specify the disk encryption set - diskEncryptionSetID: /subscriptions//resourceGroups//providers/Microsoft.Compute/diskEncryptionSets/ -reclaimPolicy: Delete -volumeBindingMode: WaitForFirstConsumer -allowVolumeExpansion: true -``` - -#### Google Kubernetes Engine (GKE) - -GKE encrypts all persistent disks by default using Google-managed encryption keys. No additional configuration is required. - -For customer-managed encryption keys (CMEK): - -```yaml -apiVersion: storage.k8s.io/v1 -kind: StorageClass -metadata: - name: pd-ssd-encrypted -provisioner: pd.csi.storage.gke.io -parameters: - type: pd-ssd - # For CMEK, specify the key - disk-encryption-kms-key: projects//locations//keyRings//cryptoKeys/ -reclaimPolicy: Delete -volumeBindingMode: WaitForFirstConsumer -allowVolumeExpansion: true -``` - -#### Amazon Elastic Kubernetes Service (EKS) - -**Important**: Unlike AKS and GKE, EBS volumes on EKS are **not encrypted by default**. You must explicitly enable encryption in the StorageClass: - -```yaml -apiVersion: storage.k8s.io/v1 -kind: StorageClass -metadata: - name: ebs-sc-encrypted -provisioner: ebs.csi.aws.com -parameters: - type: gp3 - encrypted: "true" # Required for encryption - # Optional: specify a KMS key for customer-managed encryption - # kmsKeyId: arn:aws:kms:::key/ -reclaimPolicy: Delete -volumeBindingMode: WaitForFirstConsumer -allowVolumeExpansion: true -``` - -To use the encrypted storage class with DocumentDB: - -```yaml -apiVersion: documentdb.io/preview -kind: DocumentDB -metadata: - name: my-cluster - namespace: default -spec: - environment: eks - resource: - storage: - pvcSize: 100Gi - storageClass: ebs-sc-encrypted # Use the encrypted storage class - # ... other configuration -``` - -### Encryption Summary - -| Provider | Default Encryption | Customer-Managed Keys | -|----------|-------------------|----------------------| -| AKS | ✅ Enabled (SSE) | Optional via DiskEncryptionSet | -| GKE | ✅ Enabled (Google-managed) | Optional via CMEK | -| EKS | ❌ **Not enabled** | Required: set `encrypted: "true"` in StorageClass | - -**Recommendation**: For production deployments on EKS, always create a StorageClass with `encrypted: "true"` to ensure data at rest is protected. - ---- - ## Scheduling Configure pod affinity for a documentdb cluster's database pods. This replicates @@ -370,64 +58,10 @@ spec: ... ``` -## Resource Management - -Configure resource requests and limits for optimal performance. - -### Example Configuration - -```yaml -apiVersion: documentdb.io/preview -kind: DocumentDB -metadata: - name: documentdb-resources - namespace: default -spec: - nodeCount: 1 - instancesPerNode: 3 - resource: - storage: - pvcSize: 100Gi -``` - -### Recommendations - -- **Development**: 1 CPU, 2 GiB memory -- **Production**: 2–4 CPUs, 4–8 GiB memory -- **High-load**: 4–8 CPUs, 8–16 GiB memory - ---- - ## Security Security best practices for DocumentDB deployments. -### Network Policies - -Restrict network access to DocumentDB: - -```yaml -apiVersion: networking.k8s.io/v1 -kind: NetworkPolicy -metadata: - name: documentdb-access - namespace: default -spec: - podSelector: - matchLabels: - app.kubernetes.io/name: documentdb - policyTypes: - - Ingress - ingress: - - from: - - namespaceSelector: - matchLabels: - name: app-namespace - ports: - - protocol: TCP - port: 10260 -``` - ### RBAC The operator requires specific permissions to manage DocumentDB resources. The Helm chart automatically creates the necessary RBAC rules. @@ -456,7 +90,8 @@ For production, consider using: ## Additional Resources -- [Public Documentation](https://documentdb.io/documentdb-kubernetes-operator/preview/) +- [Configuration Guides](../configuration/tls.md) — TLS, Storage, Networking, and Resource Management +- [API Reference](../api-reference.md) — CRD reference for DocumentDB, Backup, and ScheduledBackup - [TLS Setup Guide](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/tls/README.md) - [E2E Testing Guide](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/tls/E2E-TESTING.md) - [GitHub Repository](https://github.com/documentdb/documentdb-kubernetes-operator) diff --git a/docs/operator-public-documentation/preview/backup-and-restore.md b/docs/operator-public-documentation/preview/backup-and-restore.md index a51139e7..64faddb9 100644 --- a/docs/operator-public-documentation/preview/backup-and-restore.md +++ b/docs/operator-public-documentation/preview/backup-and-restore.md @@ -1,5 +1,8 @@ # Backup and Restore +!!! tip + For the complete Backup and ScheduledBackup CRD field definitions, see the [API Reference](api-reference.md#backup). + ## Prerequisites ### For Kind or Minikube diff --git a/docs/operator-public-documentation/preview/configuration/networking.md b/docs/operator-public-documentation/preview/configuration/networking.md new file mode 100644 index 00000000..188ac0e1 --- /dev/null +++ b/docs/operator-public-documentation/preview/configuration/networking.md @@ -0,0 +1,383 @@ +--- +title: Networking Configuration +description: Configure service types, external access, DNS, load balancer annotations, and Network Policies for DocumentDB on AKS, EKS, and GKE. +tags: + - configuration + - networking + - load-balancer +--- + +# Networking Configuration + +This guide covers networking configuration for the DocumentDB Kubernetes Operator, including service types, external access, DNS configuration, and cloud-specific load balancer annotations. + +## Overview + +DocumentDB exposes connectivity through Kubernetes Services. The operator creates and manages a service named `documentdb-service-` that routes traffic to the primary database instance's gateway. + +### Key Networking Fields + +```yaml +spec: + environment: aks # Cloud environment: aks, eks, gke + exposeViaService: + serviceType: LoadBalancer # LoadBalancer or ClusterIP +``` + +| Field | Type | Required | Default | Description | +|-------|------|----------|---------|-------------| +| `environment` | string | No | — | Cloud environment identifier. Determines load balancer annotations. Options: `aks`, `eks`, `gke`. | +| `exposeViaService.serviceType` | string | No | — | Kubernetes Service type. Options: `LoadBalancer`, `ClusterIP`. | + +For the full auto-generated type reference, see [ExposeViaService](../api-reference.md#exposeviaservice) in the API Reference. + +### Default Port + +DocumentDB gateway listens on port **10260** (MongoDB-compatible wire protocol over this port). + +## Service Types + +### ClusterIP (Internal Access) + +ClusterIP exposes the service only within the Kubernetes cluster. Use this for applications running in the same cluster. + +```yaml +apiVersion: documentdb.io/preview +kind: DocumentDB +metadata: + name: my-documentdb + namespace: default +spec: + nodeCount: 1 + instancesPerNode: 3 + resource: + storage: + pvcSize: 100Gi + exposeViaService: + serviceType: ClusterIP +``` + +Connect from within the cluster: + +```bash +mongosh "mongodb://:@documentdb-service-my-documentdb.default.svc.cluster.local:10260/?directConnection=true" +``` + +For local development, use port-forwarding: + +```bash +kubectl port-forward svc/documentdb-service-my-documentdb -n default 10260:10260 + +# In another terminal +mongosh "mongodb://:@localhost:10260/?directConnection=true" +``` + +### LoadBalancer (External Access) + +LoadBalancer provisions a cloud load balancer for external access. The operator automatically applies cloud-specific annotations based on the `environment` field. + +```yaml +apiVersion: documentdb.io/preview +kind: DocumentDB +metadata: + name: my-documentdb + namespace: default +spec: + nodeCount: 1 + instancesPerNode: 3 + environment: aks + resource: + storage: + pvcSize: 100Gi + exposeViaService: + serviceType: LoadBalancer +``` + +Get the external IP: + +```bash +kubectl get svc documentdb-service-my-documentdb -n default \ + -o jsonpath='{.status.loadBalancer.ingress[0].ip}' +``` + +Connect externally: + +```bash +EXTERNAL_IP=$(kubectl get svc documentdb-service-my-documentdb -n default \ + -o jsonpath='{.status.loadBalancer.ingress[0].ip}') +mongosh "mongodb://:@$EXTERNAL_IP:10260/?directConnection=true" +``` + +## Cloud-Specific Configuration + +The operator automatically applies cloud-optimized annotations based on the `environment` field. Use content tabs below to see configuration for your cloud provider. + +=== "AKS (Azure)" + + Set `environment: aks` to apply Azure-specific load balancer annotations. + + ```yaml + spec: + environment: aks + exposeViaService: + serviceType: LoadBalancer + ``` + + **Internal Load Balancer (private VNet only):** + + ```bash + kubectl annotate svc documentdb-service-my-documentdb -n default \ + service.beta.kubernetes.io/azure-load-balancer-internal="true" + ``` + +=== "EKS (AWS)" + + Set `environment: eks` to apply AWS-specific load balancer annotations. The operator configures an AWS Network Load Balancer (NLB). + + ```yaml + spec: + environment: eks + exposeViaService: + serviceType: LoadBalancer + ``` + + !!! note + On EKS, the external endpoint may be a hostname (DNS name) rather than an IP address. Use the hostname directly in your connection string. + + ```bash + # Get the external hostname + kubectl get svc documentdb-service-my-documentdb -n default \ + -o jsonpath='{.status.loadBalancer.ingress[0].hostname}' + ``` + +=== "GKE (Google Cloud)" + + Set `environment: gke` to apply GCP-specific load balancer annotations. + + ```yaml + spec: + environment: gke + exposeViaService: + serviceType: LoadBalancer + ``` + +## DNS Configuration + +### In-Cluster DNS + +Kubernetes automatically creates DNS records for services. The format is: + +``` +..svc.cluster.local +``` + +For DocumentDB: + +``` +documentdb-service-..svc.cluster.local +``` + +Example connection string using DNS: + +``` +mongodb://:@documentdb-service-my-documentdb.default.svc.cluster.local:10260/?directConnection=true +``` + +### External DNS + +For LoadBalancer services, you can set up external DNS records pointing to the load balancer's external IP or hostname. + +#### Manual DNS Setup + +1. Get the external IP/hostname: + + ```bash + kubectl get svc documentdb-service-my-documentdb -n default + ``` + +2. Create a DNS A record (for IP) or CNAME record (for hostname) pointing to the external address. + +#### Using ExternalDNS + +[ExternalDNS](https://github.com/kubernetes-sigs/external-dns) can automatically manage DNS records. Annotate the service: + +```bash +kubectl annotate svc documentdb-service-my-documentdb -n default \ + external-dns.alpha.kubernetes.io/hostname="documentdb.example.com" +``` + +## Service Routing + +The operator configures the service selector to route traffic to the CNPG primary instance: + +- **When endpoints are enabled**: The service selector targets pods with the label `cnpg.io/instanceRole: primary`, ensuring traffic always reaches the current primary. +- **During failover**: CNPG promotes a replica to primary and updates the pod labels. The service automatically routes to the new primary. + +## Connection Strings + +### Standard Connection String Format + +``` +mongodb://:@:/?directConnection=true +``` + +### With TLS + +``` +mongodb://:@:/?tls=true&directConnection=true +``` + +### Retrieving Credentials + +```bash +# Get username +kubectl get secret documentdb-credentials -n default \ + -o jsonpath='{.data.username}' | base64 -d + +# Get password +kubectl get secret documentdb-credentials -n default \ + -o jsonpath='{.data.password}' | base64 -d +``` + +### Connection String from Status + +The operator populates the connection string in the DocumentDB status: + +```bash +kubectl get documentdb my-documentdb -n default \ + -o jsonpath='{.status.connectionString}' +``` + +## Troubleshooting + +### Network Policies + +If your cluster has restrictive [NetworkPolicies](https://kubernetes.io/docs/concepts/services-networking/network-policies/), you must ensure the DocumentDB operator can reach the database pods, and that pods can communicate with each other. + +#### Allow Operator-to-Cluster Communication + +If you have a default-deny ingress policy, create an explicit policy to allow the operator: + +```yaml +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: allow-documentdb-operator + namespace: default # Namespace where DocumentDB runs +spec: + podSelector: + matchLabels: + app.kubernetes.io/name: documentdb + policyTypes: + - Ingress + ingress: + - from: + - namespaceSelector: + matchLabels: + kubernetes.io/metadata.name: documentdb-operator + ports: + - protocol: TCP + port: 8000 + - protocol: TCP + port: 5432 +``` + +#### Allow Application Access to DocumentDB + +Restrict database access to specific application namespaces: + +```yaml +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: allow-app-to-documentdb + namespace: default +spec: + podSelector: + matchLabels: + app.kubernetes.io/name: documentdb + policyTypes: + - Ingress + ingress: + - from: + - namespaceSelector: + matchLabels: + name: app-namespace + ports: + - protocol: TCP + port: 10260 +``` + +#### Allow Inter-Pod Communication + +DocumentDB instances must communicate with each other for replication: + +```yaml +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: allow-documentdb-internal + namespace: default +spec: + podSelector: + matchLabels: + app.kubernetes.io/name: documentdb + policyTypes: + - Ingress + ingress: + - from: + - podSelector: + matchLabels: + app.kubernetes.io/name: documentdb + ports: + - protocol: TCP + port: 5432 +``` + +### LoadBalancer Pending + +**Symptoms**: Service stays in `Pending` state, no external IP assigned. + +```bash +kubectl describe svc documentdb-service-my-documentdb -n default +``` + +**Common causes**: + +- Cloud provider quota exceeded for load balancers +- Missing permissions for the cloud controller manager +- Network configuration issues (subnet, security groups) + +### Connection Timeout + +**Symptoms**: Cannot connect to the external IP. + +```bash +# Verify the service has an endpoint +kubectl get endpoints documentdb-service-my-documentdb -n default + +# Check if the pod is running and ready +kubectl get pods -n default -l cnpg.io/instanceRole=primary +``` + +**Common causes**: + +- Firewall or network security group blocking port 10260 +- Pod is not ready (check pod events and logs) +- Service selector does not match any pods + +### DNS Resolution Failure + +**Symptoms**: In-cluster DNS name does not resolve. + +```bash +# Test DNS from within the cluster +kubectl run dns-test --image=busybox --rm -it -- nslookup \ + documentdb-service-my-documentdb.default.svc.cluster.local +``` + +**Common causes**: + +- CoreDNS is not running or misconfigured +- Incorrect namespace in the DNS name +- Service does not exist diff --git a/docs/operator-public-documentation/preview/configuration/resource-management.md b/docs/operator-public-documentation/preview/configuration/resource-management.md new file mode 100644 index 00000000..913e7189 --- /dev/null +++ b/docs/operator-public-documentation/preview/configuration/resource-management.md @@ -0,0 +1,188 @@ +--- +title: Resource Management +description: CPU and memory sizing guidelines for DocumentDB deployments including Kubernetes QoS classes, workload profiles, and monitoring recommendations. +tags: + - configuration + - resources + - performance +--- + +# Resource Management + +This guide covers CPU and memory sizing guidelines for DocumentDB deployments, including recommendations for different workload profiles, Kubernetes Quality of Service classes, and internal operator resource allocations. + +## Overview + +DocumentDB runs on Kubernetes and leverages the underlying CloudNative-PG operator for resource management. Proper resource allocation ensures stable performance, prevents out-of-memory kills, and optimizes cost. + +!!! important + For production database workloads, always configure explicit resource requests and limits. Running without resource constraints risks pod eviction, OOM kills, and CPU throttling during peak load. + +## Kubernetes Quality of Service (QoS) + +Kubernetes assigns a QoS class to each pod based on its resource configuration. For database workloads, we recommend **Guaranteed** QoS: + +| QoS Class | Condition | Priority | Recommendation | +|-----------|-----------|----------|----------------| +| **Guaranteed** | Requests = Limits for all containers | Highest | **Recommended for production** | +| Burstable | Requests < Limits | Medium | Acceptable for development | +| Best-Effort | No requests or limits set | Lowest (evicted first) | Not recommended | + +To achieve **Guaranteed** QoS, set requests and limits to the same value for both CPU and memory. This ensures that your DocumentDB pods are the last to be evicted under memory pressure. + +!!! note + When QoS is set to Guaranteed, CloudNative-PG configures the PostgreSQL `postmaster` process with an OOM adjustment value of `0`, keeping its low OOM score of `-997`. If the OOM killer is triggered, child processes are terminated before the `postmaster`, allowing for a clean shutdown. This behavior helps keep the database instance alive as long as possible. + +## Sizing Guidelines + +### Workload Profiles + +=== "Development" + + A minimal cluster for development and testing: + + | Setting | Value | + |---------|-------| + | Instances | 1 | + | CPU | 1 (default) | + | Memory | 2 Gi (default) | + | Storage | 10–20 Gi | + + ```yaml title="documentdb-dev.yaml" + apiVersion: documentdb.io/preview + kind: DocumentDB + metadata: + name: dev-documentdb + namespace: default + spec: + nodeCount: 1 + instancesPerNode: 1 + resource: + storage: + pvcSize: 10Gi + ``` + +=== "Production" + + A production-ready cluster with high availability: + + | Setting | Value | + |---------|-------| + | Instances | 3 | + | CPU | 2–4 CPUs | + | Memory | 4–8 Gi | + | Storage | 100–500 Gi | + + ```yaml title="documentdb-prod.yaml" + apiVersion: documentdb.io/preview + kind: DocumentDB + metadata: + name: prod-documentdb + namespace: default + spec: + nodeCount: 1 + instancesPerNode: 3 # (1)! + environment: aks + resource: + storage: + pvcSize: 100Gi + storageClass: managed-csi-premium + persistentVolumeReclaimPolicy: Retain + tls: + gateway: + mode: SelfSigned + exposeViaService: + serviceType: LoadBalancer + backup: + retentionDays: 30 + ``` + + 1. Three instances provide Guaranteed QoS with one primary and two replicas for automatic failover. + +=== "High-Load" + + For workloads requiring maximum throughput: + + | Setting | Value | + |---------|-------| + | Instances | 3 | + | CPU | 4–8 CPUs | + | Memory | 8–16 Gi | + | Storage | 500 Gi–2 Ti | + + ```yaml title="documentdb-highload.yaml" + apiVersion: documentdb.io/preview + kind: DocumentDB + metadata: + name: highload-documentdb + namespace: default + spec: + nodeCount: 1 + instancesPerNode: 3 + environment: aks + resource: + storage: + pvcSize: 1Ti + storageClass: managed-csi-premium + persistentVolumeReclaimPolicy: Retain + tls: + gateway: + mode: SelfSigned + exposeViaService: + serviceType: LoadBalancer + backup: + retentionDays: 30 + ``` + +## Internal Operator Resources + +The operator allocates resources for internal processes such as SQL jobs (schema migrations, extension upgrades). These are pre-configured and not user-configurable. + +### SQL Job Resources + +| Resource | Request | Limit | +|----------|---------|-------| +| Memory | 32 Mi | 64 Mi | +| CPU | 10m | 50m | + +SQL jobs run as non-root (UID 1000) with privilege escalation disabled. + +## Monitoring Resource Usage + +### Pod-Level Metrics + +```bash +# View resource usage for DocumentDB pods +kubectl top pods -n default -l app.kubernetes.io/name=documentdb + +# View detailed resource requests and limits +kubectl describe pod -n default | grep -A 5 "Requests\|Limits" +``` + +### Cluster-Level Monitoring + +For comprehensive monitoring, consider setting up: + +- **Prometheus + Grafana**: See the [Telemetry Guide](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/telemetry/README.md) for setup instructions +- **Kubernetes Metrics Server**: Required for `kubectl top` commands +- **Cloud provider monitoring**: Azure Monitor, CloudWatch, or Google Cloud Monitoring + +## Best Practices + +1. **Always use 3 instances for production** — Set `instancesPerNode: 3` for automatic failover and read scalability. + +2. **Use Guaranteed QoS for production** — Set resource requests equal to limits to prevent pod eviction under memory pressure. + +3. **Use premium storage** — SSDs provide significantly better I/O performance for database workloads. Benchmark storage before going to production. + +4. **Set the `Retain` reclaim policy** — Prevents accidental data loss. Clean up PVs manually after confirming data is safely backed up. + +5. **Monitor disk usage** — Expand storage proactively before reaching 80% capacity. See [Storage Configuration](storage.md) for volume expansion instructions. + +6. **Configure backups** — Set `spec.backup.retentionDays` and create a `ScheduledBackup` resource. See [API Reference](../api-reference.md) for details. + +7. **Enable TLS in production** — Use `SelfSigned` mode at minimum. See [TLS Configuration](tls.md) for options. + +8. **Set the environment field** — Specify `environment: aks`, `eks`, or `gke` to get cloud-optimized service annotations. See [Networking Configuration](networking.md). + +9. **Use dedicated nodes for databases** — Use `spec.affinity` with `nodeSelector` to schedule database pods on dedicated nodes, isolating them from other workloads. See the [CloudNative-PG scheduling documentation](https://cloudnative-pg.io/docs/1.28/scheduling/) for details. diff --git a/docs/operator-public-documentation/preview/configuration/storage.md b/docs/operator-public-documentation/preview/configuration/storage.md new file mode 100644 index 00000000..c029814f --- /dev/null +++ b/docs/operator-public-documentation/preview/configuration/storage.md @@ -0,0 +1,320 @@ +--- +title: Storage Configuration +description: Configure persistent storage for DocumentDB including storage classes, PVC sizing, volume expansion, reclaim policies, and disk encryption across AKS, EKS, and GKE. +tags: + - configuration + - storage + - encryption +--- + +# Storage Configuration + +This guide covers storage configuration for the DocumentDB Kubernetes Operator, including storage classes, PVC sizing, volume expansion, reclaim policies, security hardening, and disk encryption across cloud providers. + +## Overview + +DocumentDB uses Kubernetes PersistentVolumeClaims (PVCs) for database storage. The operator manages PersistentVolumes with security-hardened settings and provides flexible configuration for different environments. + +### Storage Fields + +```yaml +spec: + resource: + storage: + pvcSize: 100Gi # Required: storage size + storageClass: managed-csi-premium # Optional: StorageClass name + persistentVolumeReclaimPolicy: Retain # Optional: Retain or Delete +``` + +| Field | Type | Required | Default | Description | +|-------|------|----------|---------|-------------| +| `pvcSize` | string | Yes | — | Size of the PersistentVolumeClaim (for example, `10Gi`, `100Gi`, `1Ti`). | +| `storageClass` | string | No | Cluster default | Kubernetes StorageClass name. If omitted, the cluster's default StorageClass is used. | +| `persistentVolumeReclaimPolicy` | string | No | `Retain` | What happens to the PersistentVolume when the PVC is deleted. Options: `Retain`, `Delete`. | + +For the full auto-generated type reference, see [StorageConfiguration](../api-reference.md#storageconfiguration) in the API Reference. + +## Storage Classes + +### Recommended Storage Classes by Provider + +| Provider | StorageClass | Provisioner | Notes | +|----------|-------------|-------------|-------| +| **AKS** | `managed-csi-premium` | `disk.csi.azure.com` | Azure Premium SSD v2. Recommended for production. | +| **EKS** | `gp3` | `ebs.csi.aws.com` | AWS GP3 EBS volumes. Good balance of price and performance. | +| **GKE** | `premium-rwo` | `pd.csi.storage.gke.io` | Google SSD persistent disk. | +| **Kind** | `standard` (default) | `rancher.io/local-path` | Local path provisioner. Development only. | +| **Minikube** | `standard` (default) | `k8s.io/minikube-hostpath` | Host path provisioner. Development only. | + +### Using a Specific Storage Class + +=== "AKS" + + ```yaml title="documentdb-aks-storage.yaml" + apiVersion: documentdb.io/preview + kind: DocumentDB + metadata: + name: my-documentdb + namespace: default + spec: + nodeCount: 1 + instancesPerNode: 3 + environment: aks + resource: + storage: + pvcSize: 100Gi + storageClass: managed-csi-premium # (1)! + ``` + + 1. Azure Premium SSD v2 via `disk.csi.azure.com`. Recommended for production workloads. + +=== "EKS" + + ```yaml title="documentdb-eks-storage.yaml" + apiVersion: documentdb.io/preview + kind: DocumentDB + metadata: + name: my-documentdb + namespace: default + spec: + nodeCount: 1 + instancesPerNode: 3 + environment: eks + resource: + storage: + pvcSize: 100Gi + storageClass: gp3 # (1)! + ``` + + 1. AWS GP3 EBS volumes. Good balance of price and performance. + +=== "GKE" + + ```yaml title="documentdb-gke-storage.yaml" + apiVersion: documentdb.io/preview + kind: DocumentDB + metadata: + name: my-documentdb + namespace: default + spec: + nodeCount: 1 + instancesPerNode: 3 + environment: gke + resource: + storage: + pvcSize: 100Gi + storageClass: premium-rwo # (1)! + ``` + + 1. Google SSD persistent disk via `pd.csi.storage.gke.io`. + +### Verifying Available Storage Classes + +```bash +# List all storage classes +kubectl get storageclass + +# Check default storage class +kubectl get storageclass -o jsonpath='{range .items[?(@.metadata.annotations.storageclass\.kubernetes\.io/is-default-class=="true")]}{.metadata.name}{"\n"}{end}' + +# Check if a storage class supports volume expansion +kubectl get storageclass -o jsonpath='{.allowVolumeExpansion}' +``` + +## Benchmarking Storage + +!!! important + Before deploying DocumentDB to production, benchmark your storage to set clear performance expectations. Database workloads are highly sensitive to I/O latency and throughput. + +We recommend a two-level benchmarking approach: + +1. **Storage-level**: Use [fio](https://fio.readthedocs.io/en/latest/fio_doc.html) to measure raw throughput for sequential reads, sequential writes, random reads, and random writes. +2. **Database-level**: Run representative workloads against your DocumentDB cluster to measure end-to-end performance. + +Know your storage characteristics — these baselines are invaluable during capacity planning and incident response. + +## Block Storage Considerations (Ceph/Longhorn) + +Most block storage solutions in Kubernetes, such as Longhorn and Ceph, recommend having multiple replicas of a volume to enhance resiliency. This works well for workloads without built-in replication. + +However, DocumentDB (via CloudNative-PG) provides its own replication through multiple instances. Combining storage-level replication with database-level replication can result in: + +- **Unnecessary I/O amplification** — Every write is replicated at both the storage and database layers +- **Increased latency** — Additional replication hops add latency to write operations +- **Higher cost** — Storage capacity is multiplied by the storage replica factor + +!!! tip + For DocumentDB clusters with `instancesPerNode: 3`, consider using storage with a single replica (or no replication) since the database already handles data redundancy. Consult your storage provider's documentation for configuration details. + +## PVC Sizing + +### Sizing Guidelines + +| Workload | Recommended Size | Notes | +|----------|-----------------|-------| +| Development / Testing | 10–20 Gi | Minimal data, local clusters | +| Small production | 50–100 Gi | Light workloads, < 50 GB data | +| Medium production | 100–500 Gi | Moderate workloads, 50–200 GB data | +| Large production | 500 Gi–2 Ti | Heavy workloads, > 200 GB data | + +!!! tip + Provision at least **2x** your expected data size to allow for WAL files, temporary files, and growth. Monitor disk usage and expand before reaching 80% capacity. + +## Volume Expansion + +You can increase PVC size without downtime if the StorageClass supports volume expansion. + +### Prerequisites + +Verify that your StorageClass has `allowVolumeExpansion: true`: + +```bash +kubectl get storageclass -o yaml | grep allowVolumeExpansion +``` + +### Expanding Storage + +Update the `pvcSize` field in your DocumentDB spec: + +```bash +kubectl patch documentdb my-documentdb -n default --type='json' \ + -p='[{"op": "replace", "path": "/spec/resource/storage/pvcSize", "value": "200Gi"}]' +``` + +Or edit the resource directly: + +```bash +kubectl edit documentdb my-documentdb -n default +``` + +!!! warning + Volume expansion is a one-way operation. You cannot shrink a PVC after expanding it. + +## Reclaim Policy + +The reclaim policy determines what happens to PersistentVolumes when the associated PVC is deleted. + +| Policy | Behavior | Use Case | +|--------|----------|----------| +| `Retain` (default) | PV is preserved after PVC deletion. Data remains on disk. | Production. Protects against accidental data loss. | +| `Delete` | PV and underlying storage are deleted with the PVC. | Development and testing. Automatic cleanup. | + +```yaml +spec: + resource: + storage: + pvcSize: 100Gi + persistentVolumeReclaimPolicy: Retain # Recommended for production +``` + +!!! note + The operator defaults to `Retain` to prevent accidental data loss. For production workloads, always use `Retain` and manually clean up PVs after confirming the data is no longer needed. + +## PersistentVolume Security + +The operator automatically applies security-hardening mount options to all PersistentVolumes associated with DocumentDB clusters: + +| Mount Option | Description | +|-------------|-------------| +| `nodev` | Prevents device files from being interpreted on the filesystem | +| `nosuid` | Prevents setuid/setgid bits from taking effect | +| `noexec` | Prevents execution of binaries on the filesystem | + +These options are applied automatically by the PV controller and require no configuration. They are compatible with major cloud storage provisioners (Azure Disk, AWS EBS, GCE PD). + +!!! note + Local-path provisioners used by Kind (`rancher.io/local-path`) and Minikube (`k8s.io/minikube-hostpath`) do not support mount options. The operator auto-detects these provisioners and skips applying mount options. + +## Disk Encryption + +Encryption at rest protects sensitive database data stored on disk. Configuration varies by cloud provider. + +=== "AKS (Azure)" + + AKS encrypts all managed disks by default using Azure Storage Service Encryption (SSE) with platform-managed keys. **No additional configuration is required.** + + For customer-managed keys (CMK), create a StorageClass with a Disk Encryption Set: + + ```yaml title="storageclass-aks-encrypted.yaml" + apiVersion: storage.k8s.io/v1 + kind: StorageClass + metadata: + name: managed-csi-encrypted + provisioner: disk.csi.azure.com + parameters: + skuName: Premium_LRS + diskEncryptionSetID: /subscriptions//resourceGroups//providers/Microsoft.Compute/diskEncryptionSets/ + reclaimPolicy: Delete + volumeBindingMode: WaitForFirstConsumer + allowVolumeExpansion: true + ``` + +=== "GKE (Google Cloud)" + + GKE encrypts all persistent disks by default using Google-managed encryption keys. **No additional configuration is required.** + + For customer-managed encryption keys (CMEK): + + ```yaml title="storageclass-gke-encrypted.yaml" + apiVersion: storage.k8s.io/v1 + kind: StorageClass + metadata: + name: pd-ssd-encrypted + provisioner: pd.csi.storage.gke.io + parameters: + type: pd-ssd + disk-encryption-kms-key: projects//locations//keyRings//cryptoKeys/ + reclaimPolicy: Delete + volumeBindingMode: WaitForFirstConsumer + allowVolumeExpansion: true + ``` + +=== "EKS (AWS)" + + !!! warning + Unlike AKS and GKE, EBS volumes on EKS are **not encrypted by default**. You must explicitly enable encryption. + + ```yaml title="storageclass-eks-encrypted.yaml" + apiVersion: storage.k8s.io/v1 + kind: StorageClass + metadata: + name: ebs-sc-encrypted + provisioner: ebs.csi.aws.com + parameters: + type: gp3 + encrypted: "true" + # Optional: specify a KMS key for customer-managed encryption + # kmsKeyId: arn:aws:kms:::key/ + reclaimPolicy: Delete + volumeBindingMode: WaitForFirstConsumer + allowVolumeExpansion: true + ``` + + Then reference the encrypted StorageClass in your DocumentDB spec: + + ```yaml + apiVersion: documentdb.io/preview + kind: DocumentDB + metadata: + name: my-documentdb + namespace: default + spec: + environment: eks + nodeCount: 1 + instancesPerNode: 3 + resource: + storage: + pvcSize: 100Gi + storageClass: ebs-sc-encrypted + ``` + +### Encryption Summary + +| Provider | Default Encryption | Customer-Managed Keys | +|----------|-------------------|----------------------| +| **AKS** | ✅ Enabled (SSE with platform keys) | Optional via DiskEncryptionSet | +| **GKE** | ✅ Enabled (Google-managed keys) | Optional via CMEK | +| **EKS** | ❌ **Not enabled** | Required: set `encrypted: "true"` in StorageClass | + +!!! tip + For production deployments on EKS, always create a StorageClass with `encrypted: "true"` to ensure data at rest is protected. diff --git a/docs/operator-public-documentation/preview/configuration/tls.md b/docs/operator-public-documentation/preview/configuration/tls.md new file mode 100644 index 00000000..37cf709e --- /dev/null +++ b/docs/operator-public-documentation/preview/configuration/tls.md @@ -0,0 +1,338 @@ +--- +title: TLS Configuration +description: Configure TLS encryption for DocumentDB gateway connections with SelfSigned, Provided, and CertManager modes, certificate rotation, and troubleshooting. +tags: + - configuration + - tls + - security +--- + +# TLS Configuration + +This guide covers TLS configuration for the DocumentDB Kubernetes Operator, including all supported modes, certificate management, rotation, monitoring, and troubleshooting. + +## Overview + +The DocumentDB operator supports TLS encryption for gateway connections via the `spec.tls` configuration. TLS protects data in transit between clients and the DocumentDB gateway. + +### Supported Modes + +| Mode | Description | Best For | +|------|-------------|----------| +| `Disabled` (default) | No TLS encryption | Development and testing only | +| `SelfSigned` | Automatic certificates via cert-manager with a self-signed CA | Development, testing, and environments without external PKI (Public Key Infrastructure) | +| `Provided` | Bring your own certificates (for example, from Azure Key Vault) | Production with centralized certificate management | +| `CertManager` | Custom cert-manager issuers (for example, Let's Encrypt, corporate CA) | Production with existing cert-manager infrastructure | + +### Prerequisites + +- **SelfSigned mode**: [cert-manager](https://cert-manager.io/) must be installed in the cluster +- **CertManager mode**: [cert-manager](https://cert-manager.io/) installed, plus a configured Issuer or ClusterIssuer +- **Provided mode**: A Kubernetes TLS Secret containing `tls.crt`, `tls.key`, and `ca.crt` + +## Configuration + +Select your TLS mode below. Each tab shows the complete YAML configuration and connection instructions. + +=== "Disabled" + + !!! danger "Not recommended for production" + + ```yaml title="documentdb-tls-disabled.yaml" + apiVersion: documentdb.io/preview + kind: DocumentDB + metadata: + name: my-documentdb + namespace: default + spec: + nodeCount: 1 + instancesPerNode: 1 + resource: + storage: + pvcSize: 10Gi + tls: + gateway: + mode: Disabled + ``` + + Connect without TLS: + + ```bash + mongosh "mongodb://:@:10260/?directConnection=true" + ``` + +=== "SelfSigned" + + SelfSigned mode uses cert-manager to automatically generate and manage a self-signed CA and server certificate. No additional configuration is needed beyond setting the mode. + + ```yaml title="documentdb-tls-selfsigned.yaml" + apiVersion: documentdb.io/preview + kind: DocumentDB + metadata: + name: my-documentdb + namespace: default + spec: + nodeCount: 1 + instancesPerNode: 3 + resource: + storage: + pvcSize: 10Gi + tls: + gateway: + mode: SelfSigned # (1)! + ``` + + 1. Requires [cert-manager](https://cert-manager.io/) installed in the cluster. The operator handles CA and certificate generation automatically. + + The operator will: + + 1. Create a self-signed CA Issuer + 2. Generate a CA certificate + 3. Create a server certificate signed by the CA + 4. Mount the certificate in the gateway pod + + Connect with TLS using the CA certificate: + + ```bash + # Extract the CA certificate + kubectl get secret documentdb-gateway-cert-tls -n default \ + -o jsonpath='{.data.ca\.crt}' | base64 -d > ca.crt + + # Connect with mongosh + mongosh "mongodb://:@:10260/?tls=true&directConnection=true" \ + --tls --tlsCAFile ca.crt + ``` + +=== "CertManager" + + CertManager mode uses a custom cert-manager Issuer or ClusterIssuer to issue certificates. This is ideal for production environments with existing PKI infrastructure. + + ```yaml title="documentdb-tls-certmanager.yaml" + apiVersion: documentdb.io/preview + kind: DocumentDB + metadata: + name: my-documentdb + namespace: default + spec: + nodeCount: 1 + instancesPerNode: 3 + resource: + storage: + pvcSize: 100Gi + tls: + gateway: + mode: CertManager + certManager: + issuerRef: + name: letsencrypt-prod # (1)! + kind: ClusterIssuer # (2)! + dnsNames: # (3)! + - documentdb.example.com + - "*.documentdb.example.com" + secretName: my-documentdb-tls + ``` + + 1. Name of your cert-manager Issuer or ClusterIssuer resource. + 2. Use `ClusterIssuer` for cluster-scoped issuers, or `Issuer` for namespace-scoped. + 3. Subject Alternative Names — add all DNS names clients will use to connect. + + #### CertManager Field Reference + + | Field | Type | Required | Default | Description | + |-------|------|----------|---------|-------------| + | `issuerRef.name` | string | Yes | — | Name of the cert-manager Issuer or ClusterIssuer. | + | `issuerRef.kind` | string | No | `Issuer` | Kind of the issuer: `Issuer` (namespace-scoped) or `ClusterIssuer` (cluster-scoped). | + | `issuerRef.group` | string | No | `cert-manager.io` | API group of the issuer. | + | `dnsNames` | []string | No | — | Subject Alternative Names for the certificate. | + | `secretName` | string | No | Auto-generated | Name of the Kubernetes Secret to store the issued certificate. | + +=== "Provided" + + Provided mode lets you supply your own TLS certificates. This is ideal when certificates are managed externally (for example, from Azure Key Vault, HashiCorp Vault, or a corporate CA). + + First, create a Kubernetes TLS Secret with your certificates: + + ```bash title="Create TLS secret" + kubectl create secret generic my-documentdb-tls -n default \ + --from-file=tls.crt=server.crt \ + --from-file=tls.key=server.key \ + --from-file=ca.crt=ca.crt + ``` + + Then reference the secret in your DocumentDB configuration: + + ```yaml title="documentdb-tls-provided.yaml" + apiVersion: documentdb.io/preview + kind: DocumentDB + metadata: + name: my-documentdb + namespace: default + spec: + nodeCount: 1 + instancesPerNode: 3 + resource: + storage: + pvcSize: 100Gi + tls: + gateway: + mode: Provided + provided: + secretName: my-documentdb-tls # (1)! + ``` + + 1. The Secret must contain three keys: `tls.crt` (server certificate), `tls.key` (private key), and `ca.crt` (CA certificate). + + #### Secret Requirements + + The TLS Secret must contain these keys: + + | Key | Description | + |-----|-------------| + | `tls.crt` | Server certificate (PEM-encoded). | + | `tls.key` | Private key for the server certificate (PEM-encoded). | + | `ca.crt` | Certificate Authority certificate used to sign the server certificate (PEM-encoded). | + +#### Example: Let's Encrypt with ClusterIssuer + +First, create a ClusterIssuer: + +```yaml title="letsencrypt-clusterissuer.yaml" +apiVersion: cert-manager.io/v1 +kind: ClusterIssuer +metadata: + name: letsencrypt-prod +spec: + acme: + server: https://acme-v02.api.letsencrypt.org/directory + email: admin@example.com + privateKeySecretRef: + name: letsencrypt-prod-key + solvers: + - http01: + ingress: + class: nginx +``` + +Then reference it in your DocumentDB resource using the CertManager tab above. + +### Azure Key Vault Integration + +For production Azure deployments, use the Secrets Store CSI driver to sync certificates from Azure Key Vault: + +1. Enable the CSI driver on your AKS cluster +2. Create a `SecretProviderClass` that references your Key Vault certificate +3. The CSI driver syncs the certificate as a Kubernetes Secret +4. Reference the synced Secret name in `provided.secretName` + +For a complete walkthrough, see the [Manual Provided Mode Setup Guide](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/tls/MANUAL-PROVIDED-MODE-SETUP.md). + +## Certificate Rotation + +### Automatic Rotation + +- **SelfSigned and CertManager modes**: cert-manager automatically rotates certificates before expiration. The operator detects the updated Secret and reloads the gateway. +- **Provided mode**: Update the external Secret (or trigger a CSI driver sync). The operator picks up changes automatically. + +### Monitoring Certificate Expiration + +```bash +# Check certificate status via cert-manager +kubectl get certificate -n + +# Check expiration date +kubectl get secret -n \ + -o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -noout -dates + +# Check DocumentDB TLS status +kubectl get documentdb -n \ + -o jsonpath='{.status.tls}' | jq +``` + +Example TLS status output: + +```json +{ + "ready": true, + "secretName": "documentdb-gateway-cert-tls", + "message": "" +} +``` + +## Troubleshooting + +### Certificate Not Ready + +**Symptoms**: `tls.ready` is `false`, pods may not start. + +```bash +# Check cert-manager certificate status +kubectl describe certificate -n + +# Check cert-manager logs +kubectl logs -n cert-manager deployment/cert-manager + +# Check for pending CertificateRequests +kubectl get certificaterequest -n +``` + +**Common causes**: + +- cert-manager is not installed or not running +- The Issuer or ClusterIssuer does not exist or is not ready +- DNS validation is failing (for ACME/Let's Encrypt) + +### TLS Connection Failures + +**Symptoms**: Clients cannot connect with TLS enabled. + +```bash +# Test TLS handshake directly +EXTERNAL_IP=$(kubectl get svc -n \ + -o jsonpath='{.items[0].status.loadBalancer.ingress[0].ip}') +openssl s_client -connect $EXTERNAL_IP:10260 + +# Check gateway logs +kubectl logs -n -c gateway +``` + +**Common causes**: + +- Client is not using the correct CA certificate +- Certificate SANs do not match the connection hostname +- The Secret is missing required keys (`tls.crt`, `tls.key`, `ca.crt`) + +### Azure Key Vault Access Denied (Provided Mode) + +**Symptoms**: Secret is not synced from Azure Key Vault. + +```bash +# Check SecretProviderClass status +kubectl describe secretproviderclass -n + +# Check CSI driver pods +kubectl get pods -n kube-system -l app=secrets-store-csi-driver +``` + +**Common causes**: + +- Managed identity does not have `Key Vault Secrets User` role on the Key Vault +- The Key Vault firewall is blocking access from the AKS cluster +- The CSI driver addon is not enabled on the cluster + +## Security Context + +The DocumentDB gateway runs with a hardened security context: + +- **Non-root execution**: All containers run as non-root users +- **No privilege escalation**: `allowPrivilegeEscalation: false` +- **Read-only root filesystem**: Where applicable + +TLS certificates are mounted as read-only volumes into the gateway container. The operator manages certificate lifecycle without requiring elevated privileges. + +## Additional Resources + +- [Complete TLS Setup Guide](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/tls/README.md) — Automated scripts for TLS setup +- [E2E Testing Guide](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/tls/E2E-TESTING.md) — Automated TLS testing +- [Manual Provided Mode Setup](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/tls/MANUAL-PROVIDED-MODE-SETUP.md) — Step-by-step Azure Key Vault integration +- [API Reference — TLS Types](../api-reference.md#tlsconfiguration) — Auto-generated reference for TLSConfiguration, GatewayTLS, and related types +- [cert-manager Documentation](https://cert-manager.io/docs/) diff --git a/docs/operator-public-documentation/preview/faq.md b/docs/operator-public-documentation/preview/faq.md index 22213fca..f08d54be 100644 --- a/docs/operator-public-documentation/preview/faq.md +++ b/docs/operator-public-documentation/preview/faq.md @@ -23,6 +23,10 @@ The operator works on any conformant Kubernetes distribution (version 1.30 or la The operator is under active development and currently in **preview**. We don't yet recommend it for production workloads. We welcome feedback and contributions as we work toward general availability. +### Where can I find the full CRD field reference? + +See the [API Reference](api-reference.md) for auto-generated documentation of all DocumentDB, Backup, and ScheduledBackup CRD fields with types, defaults, and validation rules. + ## Installation ### Do I need to install CloudNativePG separately? diff --git a/docs/operator-public-documentation/preview/index.md b/docs/operator-public-documentation/preview/index.md index f875fd7a..f4afc39b 100644 --- a/docs/operator-public-documentation/preview/index.md +++ b/docs/operator-public-documentation/preview/index.md @@ -374,44 +374,31 @@ For details, see [Sidecar Injector Plugin Configuration](https://github.com/docu ### Local high-availability (HA) -Deploy multiple DocumentDB instances with automatic failover by setting `instancesPerNode` to a value greater than 1. +Deploy multiple DocumentDB instances with automatic failover by setting `instancesPerNode` to a value greater than 1 (up to 3). This creates one primary instance and two replicas for read scalability and automatic failover. -#### Enable local HA - -```bash -cat < +```yaml spec: - nodeCount: 1 - instancesPerNode: 3 - documentDbCredentialSecret: documentdb-credentials - resource: - storage: - pvcSize: 10Gi - exposeViaService: - serviceType: LoadBalancer -EOF + instancesPerNode: 3 # 1 primary + 2 replicas ``` -This configuration creates: - -- **1 primary instance** — handles all write operations -- **2 replica instances** — provide read scalability and automatic failover +For sizing guidelines and workload profiles, see [Resource Management](configuration/resource-management.md). For the full field reference, see the [API Reference](api-reference.md#documentdbspec). ### Multi-cloud deployment The operator supports deployment across multiple cloud environments and Kubernetes distributions. For guidance, see the [Multi-Cloud Deployment Guide](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/multi-cloud-deployment/README.md). -### TLS setup +### TLS + +The operator supports four TLS modes: Disabled, SelfSigned, CertManager, and Provided. See [TLS Configuration](configuration/tls.md) for setup instructions, certificate rotation, and troubleshooting. + +For automated TLS testing scripts, see the [TLS Playground](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/tls/README.md). + -For advanced TLS configuration and testing: +### Further reading -- [TLS Setup Guide](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/tls/README.md) — Complete TLS configuration guide -- [E2E Testing](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/tls/E2E-TESTING.md) — Comprehensive testing procedures +- [API Reference](api-reference.md) — Auto-generated CRD type reference +- [Backup and Restore](backup-and-restore.md) — On-demand and scheduled backups +- [kubectl Plugin](kubectl-plugin.md) — CLI tooling for day-two operations ## Clean up diff --git a/mkdocs.yml b/mkdocs.yml index b9af440f..de3f69b1 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -8,6 +8,13 @@ theme: name: material features: - content.code.copy + - content.code.annotate + - content.tabs.link + - navigation.tabs + - navigation.sections + - navigation.top + - search.suggest + - search.highlight # mike (https://github.com/jimporter/mike) enables versioned docs on GitHub Pages. # Each release gets a frozen snapshot (e.g., /0.2.0/), and "latest" always points @@ -21,6 +28,11 @@ extra: nav: - Preview: - Get Started: preview/index.md + - Configuration: + - TLS: preview/configuration/tls.md + - Storage: preview/configuration/storage.md + - Networking: preview/configuration/networking.md + - Resource Management: preview/configuration/resource-management.md - Advanced Configuration: preview/advanced-configuration/README.md - Backup and Restore: preview/backup-and-restore.md - API Reference: preview/api-reference.md @@ -33,5 +45,13 @@ plugins: markdown_extensions: - admonition + - attr_list + - md_in_html + - pymdownx.superfences + - pymdownx.tabbed: + alternate_style: true + - pymdownx.highlight: + anchor_linenums: true + - pymdownx.details - toc: permalink: true From dc97df301617f85197ad2146ec0108ce292aa582 Mon Sep 17 00:00:00 2001 From: Wenting Wu Date: Fri, 6 Mar 2026 16:13:38 -0500 Subject: [PATCH 2/9] docs: improve TLS configuration docs and add documentation testing guide - Reorganize TLS docs: move supported modes and prerequisites into each mode tab - Add description, best-for, and prerequisite notes to each TLS mode - Simplify CertManager tab with clear workflow and external links - Replace inline field reference tables with API Reference links - Replace Azure Key Vault section with link to setup guide - Add cert-manager install links pointing to index.md - Add documentation testing instructions to development-environment.md - Add cross-reference from CONTRIBUTING.md to dev environment guide Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Wenting Wu --- CONTRIBUTING.md | 5 +- .../development-environment.md | 34 +++++- .../preview/configuration/tls.md | 109 +++++------------- 3 files changed, 69 insertions(+), 79 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index c282e9a1..2739912c 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -11,4 +11,7 @@ instructions provided by the bot. You will only need to do this once across all This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) -or contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments. \ No newline at end of file +or contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments. + +For development environment setup—including how to test public documentation locally—see the +[Development Environment Guide](docs/developer-guides/development-environment.md). \ No newline at end of file diff --git a/docs/developer-guides/development-environment.md b/docs/developer-guides/development-environment.md index 2e2eac92..c1f8ce82 100644 --- a/docs/developer-guides/development-environment.md +++ b/docs/developer-guides/development-environment.md @@ -186,7 +186,39 @@ operator through Helm. --- -## 4. Contributing guidelines +## 4. Testing public documentation locally + +To build the documentation and run a development server for live previewing: + +1. Create a Python virtual environment: + + ```bash + python3 -m venv documentdb-k8s-docs-venv + ``` + +2. Activate the virtual environment: + + ```bash + source documentdb-k8s-docs-venv/bin/activate + ``` + +3. Install MkDocs: + + ```bash + pip install mkdocs mkdocs-material + ``` + +4. Run the local MkDocs server for testing: + + ```bash + mkdocs serve + ``` + + This starts a local server (typically at `http://127.0.0.1:8000`) where you can preview documentation changes in real time. + +--- + +## 5. Contributing guidelines Before opening a pull request, review the repository-wide [CONTRIBUTING.md](../../CONTRIBUTING.md) for the CLA process, style guidance, diff --git a/docs/operator-public-documentation/preview/configuration/tls.md b/docs/operator-public-documentation/preview/configuration/tls.md index 37cf709e..2fe99070 100644 --- a/docs/operator-public-documentation/preview/configuration/tls.md +++ b/docs/operator-public-documentation/preview/configuration/tls.md @@ -15,28 +15,19 @@ This guide covers TLS configuration for the DocumentDB Kubernetes Operator, incl The DocumentDB operator supports TLS encryption for gateway connections via the `spec.tls` configuration. TLS protects data in transit between clients and the DocumentDB gateway. -### Supported Modes - -| Mode | Description | Best For | -|------|-------------|----------| -| `Disabled` (default) | No TLS encryption | Development and testing only | -| `SelfSigned` | Automatic certificates via cert-manager with a self-signed CA | Development, testing, and environments without external PKI (Public Key Infrastructure) | -| `Provided` | Bring your own certificates (for example, from Azure Key Vault) | Production with centralized certificate management | -| `CertManager` | Custom cert-manager issuers (for example, Let's Encrypt, corporate CA) | Production with existing cert-manager infrastructure | +## Configuration -### Prerequisites +Select your TLS mode below. Each tab shows prerequisites, the complete YAML configuration, and connection instructions. -- **SelfSigned mode**: [cert-manager](https://cert-manager.io/) must be installed in the cluster -- **CertManager mode**: [cert-manager](https://cert-manager.io/) installed, plus a configured Issuer or ClusterIssuer -- **Provided mode**: A Kubernetes TLS Secret containing `tls.crt`, `tls.key`, and `ca.crt` +=== "Disabled (default)" -## Configuration + **Best for:** Development and testing only -Select your TLS mode below. Each tab shows the complete YAML configuration and connection instructions. + !!! danger "Not recommended for production" -=== "Disabled" + **Prerequisites:** None - !!! danger "Not recommended for production" + Disabled mode runs the gateway without TLS encryption. All traffic between clients and the gateway is unencrypted. ```yaml title="documentdb-tls-disabled.yaml" apiVersion: documentdb.io/preview @@ -63,6 +54,11 @@ Select your TLS mode below. Each tab shows the complete YAML configuration and c === "SelfSigned" + **Best for:** Development, testing, and environments without external PKI (Public Key Infrastructure) + + !!! note "Prerequisites" + [cert-manager](https://cert-manager.io/) must be installed in the cluster. See [Install cert-manager](../index.md#install-cert-manager) for setup instructions. + SelfSigned mode uses cert-manager to automatically generate and manage a self-signed CA and server certificate. No additional configuration is needed beyond setting the mode. ```yaml title="documentdb-tls-selfsigned.yaml" @@ -82,10 +78,7 @@ Select your TLS mode below. Each tab shows the complete YAML configuration and c mode: SelfSigned # (1)! ``` - 1. Requires [cert-manager](https://cert-manager.io/) installed in the cluster. The operator handles CA and certificate generation automatically. - - The operator will: - + The operator handles CA and certificate generation automatically: 1. Create a self-signed CA Issuer 2. Generate a CA certificate 3. Create a server certificate signed by the CA @@ -105,7 +98,13 @@ Select your TLS mode below. Each tab shows the complete YAML configuration and c === "CertManager" - CertManager mode uses a custom cert-manager Issuer or ClusterIssuer to issue certificates. This is ideal for production environments with existing PKI infrastructure. + **Best for:** Production with existing cert-manager infrastructure + + !!! note "Prerequisites" + [cert-manager](https://cert-manager.io/) must be installed (see [Install cert-manager](../index.md#install-cert-manager)), plus a configured [Issuer or ClusterIssuer](https://cert-manager.io/docs/concepts/issuer/). + + CertManager mode lets you use your own cert-manager Issuer(namespace-scoped) or ClusterIssuer (cluster-scoped) to issue TLS certificates for the DocumentDB gateway. This is ideal for production environments that already have PKI infrastructure (for example, [Let's Encrypt](https://letsencrypt.org/), or a corporate CA). + ```yaml title="documentdb-tls-certmanager.yaml" apiVersion: documentdb.io/preview @@ -129,25 +128,23 @@ Select your TLS mode below. Each tab shows the complete YAML configuration and c dnsNames: # (3)! - documentdb.example.com - "*.documentdb.example.com" - secretName: my-documentdb-tls + secretName: my-documentdb-tls # (4)! ``` - 1. Name of your cert-manager Issuer or ClusterIssuer resource. - 2. Use `ClusterIssuer` for cluster-scoped issuers, or `Issuer` for namespace-scoped. - 3. Subject Alternative Names — add all DNS names clients will use to connect. - - #### CertManager Field Reference + 1. Must match the `metadata.name` of your Issuer or ClusterIssuer. + 2. Use [`ClusterIssuer`](https://cert-manager.io/docs/concepts/issuer/#cluster-resource) for cluster-scoped issuers, or [`Issuer`](https://cert-manager.io/docs/concepts/issuer/#namespaces) for namespace-scoped. + 3. [Subject Alternative Names](https://en.wikipedia.org/wiki/Subject_Alternative_Name) — add all DNS names clients will use to connect. + 4. The Kubernetes Secret where cert-manager will store the issued certificate. - | Field | Type | Required | Default | Description | - |-------|------|----------|---------|-------------| - | `issuerRef.name` | string | Yes | — | Name of the cert-manager Issuer or ClusterIssuer. | - | `issuerRef.kind` | string | No | `Issuer` | Kind of the issuer: `Issuer` (namespace-scoped) or `ClusterIssuer` (cluster-scoped). | - | `issuerRef.group` | string | No | `cert-manager.io` | API group of the issuer. | - | `dnsNames` | []string | No | — | Subject Alternative Names for the certificate. | - | `secretName` | string | No | Auto-generated | Name of the Kubernetes Secret to store the issued certificate. | + For a complete list of CertManager fields, see the [API Reference — TLS Types](../api-reference.md#tlsconfiguration). === "Provided" + **Best for:** Production with centralized certificate management + + !!! note "Prerequisites" + A Kubernetes TLS Secret containing `tls.crt`, `tls.key`, and `ca.crt`. + Provided mode lets you supply your own TLS certificates. This is ideal when certificates are managed externally (for example, from Azure Key Vault, HashiCorp Vault, or a corporate CA). First, create a Kubernetes TLS Secret with your certificates: @@ -182,49 +179,7 @@ Select your TLS mode below. Each tab shows the complete YAML configuration and c 1. The Secret must contain three keys: `tls.crt` (server certificate), `tls.key` (private key), and `ca.crt` (CA certificate). - #### Secret Requirements - - The TLS Secret must contain these keys: - - | Key | Description | - |-----|-------------| - | `tls.crt` | Server certificate (PEM-encoded). | - | `tls.key` | Private key for the server certificate (PEM-encoded). | - | `ca.crt` | Certificate Authority certificate used to sign the server certificate (PEM-encoded). | - -#### Example: Let's Encrypt with ClusterIssuer - -First, create a ClusterIssuer: - -```yaml title="letsencrypt-clusterissuer.yaml" -apiVersion: cert-manager.io/v1 -kind: ClusterIssuer -metadata: - name: letsencrypt-prod -spec: - acme: - server: https://acme-v02.api.letsencrypt.org/directory - email: admin@example.com - privateKeySecretRef: - name: letsencrypt-prod-key - solvers: - - http01: - ingress: - class: nginx -``` - -Then reference it in your DocumentDB resource using the CertManager tab above. - -### Azure Key Vault Integration - -For production Azure deployments, use the Secrets Store CSI driver to sync certificates from Azure Key Vault: - -1. Enable the CSI driver on your AKS cluster -2. Create a `SecretProviderClass` that references your Key Vault certificate -3. The CSI driver syncs the certificate as a Kubernetes Secret -4. Reference the synced Secret name in `provided.secretName` - -For a complete walkthrough, see the [Manual Provided Mode Setup Guide](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/tls/MANUAL-PROVIDED-MODE-SETUP.md). + For Azure Key Vault integration, see the [Manual Provided Mode Setup Guide](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/tls/MANUAL-PROVIDED-MODE-SETUP.md). ## Certificate Rotation From 8fe0ff7b8c9fec017a597abf7247de52743468d9 Mon Sep 17 00:00:00 2001 From: Wenting Wu Date: Fri, 6 Mar 2026 21:51:01 -0500 Subject: [PATCH 3/9] docs: streamline configuration documentation - Remove redundant content, internal implementation details, and generic Kubernetes info - Networking: consolidate into nested tabs (Internal/External with cloud sub-tabs) - Resource management: remove QoS deep-dive, internal operator resources, monitoring - Storage: simplify cloud tabs, remove PV security and block storage sections - TLS: fix ca.crt requirement (optional per source code), trim troubleshooting - All pages: replace inline field tables with API Reference links Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Wenting Wu --- .../preview/configuration/networking.md | 399 ++++-------------- .../configuration/resource-management.md | 104 +---- .../preview/configuration/storage.md | 296 ++----------- .../preview/configuration/tls.md | 49 +-- 4 files changed, 124 insertions(+), 724 deletions(-) diff --git a/docs/operator-public-documentation/preview/configuration/networking.md b/docs/operator-public-documentation/preview/configuration/networking.md index 188ac0e1..2b422148 100644 --- a/docs/operator-public-documentation/preview/configuration/networking.md +++ b/docs/operator-public-documentation/preview/configuration/networking.md @@ -9,375 +9,118 @@ tags: # Networking Configuration -This guide covers networking configuration for the DocumentDB Kubernetes Operator, including service types, external access, DNS configuration, and cloud-specific load balancer annotations. +Configure how clients connect to your DocumentDB cluster. ## Overview -DocumentDB exposes connectivity through Kubernetes Services. The operator creates and manages a service named `documentdb-service-` that routes traffic to the primary database instance's gateway. +DocumentDB exposes connectivity through a Kubernetes Service named `documentdb-service-`. The gateway listens on port **10260** (MongoDB-compatible wire protocol). -### Key Networking Fields - -```yaml -spec: - environment: aks # Cloud environment: aks, eks, gke - exposeViaService: - serviceType: LoadBalancer # LoadBalancer or ClusterIP -``` - -| Field | Type | Required | Default | Description | -|-------|------|----------|---------|-------------| -| `environment` | string | No | — | Cloud environment identifier. Determines load balancer annotations. Options: `aks`, `eks`, `gke`. | -| `exposeViaService.serviceType` | string | No | — | Kubernetes Service type. Options: `LoadBalancer`, `ClusterIP`. | - -For the full auto-generated type reference, see [ExposeViaService](../api-reference.md#exposeviaservice) in the API Reference. - -### Default Port - -DocumentDB gateway listens on port **10260** (MongoDB-compatible wire protocol over this port). +For the full field reference, see [ExposeViaService](../api-reference.md#exposeviaservice) in the API Reference. ## Service Types -### ClusterIP (Internal Access) - -ClusterIP exposes the service only within the Kubernetes cluster. Use this for applications running in the same cluster. - -```yaml -apiVersion: documentdb.io/preview -kind: DocumentDB -metadata: - name: my-documentdb - namespace: default -spec: - nodeCount: 1 - instancesPerNode: 3 - resource: - storage: - pvcSize: 100Gi - exposeViaService: - serviceType: ClusterIP -``` - -Connect from within the cluster: - -```bash -mongosh "mongodb://:@documentdb-service-my-documentdb.default.svc.cluster.local:10260/?directConnection=true" -``` - -For local development, use port-forwarding: - -```bash -kubectl port-forward svc/documentdb-service-my-documentdb -n default 10260:10260 - -# In another terminal -mongosh "mongodb://:@localhost:10260/?directConnection=true" -``` - -### LoadBalancer (External Access) +=== "ClusterIP (Internal)" -LoadBalancer provisions a cloud load balancer for external access. The operator automatically applies cloud-specific annotations based on the `environment` field. - -```yaml -apiVersion: documentdb.io/preview -kind: DocumentDB -metadata: - name: my-documentdb - namespace: default -spec: - nodeCount: 1 - instancesPerNode: 3 - environment: aks - resource: - storage: - pvcSize: 100Gi - exposeViaService: - serviceType: LoadBalancer -``` - -Get the external IP: - -```bash -kubectl get svc documentdb-service-my-documentdb -n default \ - -o jsonpath='{.status.loadBalancer.ingress[0].ip}' -``` - -Connect externally: - -```bash -EXTERNAL_IP=$(kubectl get svc documentdb-service-my-documentdb -n default \ - -o jsonpath='{.status.loadBalancer.ingress[0].ip}') -mongosh "mongodb://:@$EXTERNAL_IP:10260/?directConnection=true" -``` - -## Cloud-Specific Configuration - -The operator automatically applies cloud-optimized annotations based on the `environment` field. Use content tabs below to see configuration for your cloud provider. - -=== "AKS (Azure)" - - Set `environment: aks` to apply Azure-specific load balancer annotations. + Exposes the service only within the Kubernetes cluster. ```yaml spec: - environment: aks exposeViaService: - serviceType: LoadBalancer + serviceType: ClusterIP ``` - **Internal Load Balancer (private VNet only):** + Connect from within the cluster: ```bash - kubectl annotate svc documentdb-service-my-documentdb -n default \ - service.beta.kubernetes.io/azure-load-balancer-internal="true" + mongosh "mongodb://:@documentdb-service-my-documentdb.default.svc.cluster.local:10260/?directConnection=true" ``` -=== "EKS (AWS)" - - Set `environment: eks` to apply AWS-specific load balancer annotations. The operator configures an AWS Network Load Balancer (NLB). - - ```yaml - spec: - environment: eks - exposeViaService: - serviceType: LoadBalancer - ``` - - !!! note - On EKS, the external endpoint may be a hostname (DNS name) rather than an IP address. Use the hostname directly in your connection string. + For local development, use port-forwarding: ```bash - # Get the external hostname - kubectl get svc documentdb-service-my-documentdb -n default \ - -o jsonpath='{.status.loadBalancer.ingress[0].hostname}' - ``` - -=== "GKE (Google Cloud)" - - Set `environment: gke` to apply GCP-specific load balancer annotations. - - ```yaml - spec: - environment: gke - exposeViaService: - serviceType: LoadBalancer + kubectl port-forward svc/documentdb-service-my-documentdb -n default 10260:10260 + mongosh "mongodb://:@localhost:10260/?directConnection=true" ``` -## DNS Configuration - -### In-Cluster DNS +=== "LoadBalancer (External)" -Kubernetes automatically creates DNS records for services. The format is: - -``` -..svc.cluster.local -``` - -For DocumentDB: - -``` -documentdb-service-..svc.cluster.local -``` - -Example connection string using DNS: - -``` -mongodb://:@documentdb-service-my-documentdb.default.svc.cluster.local:10260/?directConnection=true -``` - -### External DNS - -For LoadBalancer services, you can set up external DNS records pointing to the load balancer's external IP or hostname. - -#### Manual DNS Setup - -1. Get the external IP/hostname: - - ```bash - kubectl get svc documentdb-service-my-documentdb -n default - ``` + Provisions a cloud load balancer for external access. Set the `environment` field to get cloud-optimized annotations. -2. Create a DNS A record (for IP) or CNAME record (for hostname) pointing to the external address. + === "AKS (Azure)" -#### Using ExternalDNS + ```yaml + spec: + environment: aks + exposeViaService: + serviceType: LoadBalancer + ``` -[ExternalDNS](https://github.com/kubernetes-sigs/external-dns) can automatically manage DNS records. Annotate the service: + Get the external IP and connect: -```bash -kubectl annotate svc documentdb-service-my-documentdb -n default \ - external-dns.alpha.kubernetes.io/hostname="documentdb.example.com" -``` + ```bash + EXTERNAL_IP=$(kubectl get svc documentdb-service-my-documentdb -n default \ + -o jsonpath='{.status.loadBalancer.ingress[0].ip}') + mongosh "mongodb://:@$EXTERNAL_IP:10260/?directConnection=true" + ``` -## Service Routing + For an internal load balancer (private VNet only): -The operator configures the service selector to route traffic to the CNPG primary instance: + ```bash + kubectl annotate svc documentdb-service-my-documentdb -n default \ + service.beta.kubernetes.io/azure-load-balancer-internal="true" + ``` -- **When endpoints are enabled**: The service selector targets pods with the label `cnpg.io/instanceRole: primary`, ensuring traffic always reaches the current primary. -- **During failover**: CNPG promotes a replica to primary and updates the pod labels. The service automatically routes to the new primary. + === "EKS (AWS)" -## Connection Strings + ```yaml + spec: + environment: eks + exposeViaService: + serviceType: LoadBalancer + ``` -### Standard Connection String Format + !!! note + On EKS, the external endpoint is a hostname rather than an IP. Use it directly in your connection string. -``` -mongodb://:@:/?directConnection=true -``` + ```bash + HOSTNAME=$(kubectl get svc documentdb-service-my-documentdb -n default \ + -o jsonpath='{.status.loadBalancer.ingress[0].hostname}') + mongosh "mongodb://:@$HOSTNAME:10260/?directConnection=true" + ``` -### With TLS + === "GKE (Google Cloud)" -``` -mongodb://:@:/?tls=true&directConnection=true -``` + ```yaml + spec: + environment: gke + exposeViaService: + serviceType: LoadBalancer + ``` -### Retrieving Credentials + Get the external IP and connect: -```bash -# Get username -kubectl get secret documentdb-credentials -n default \ - -o jsonpath='{.data.username}' | base64 -d + ```bash + EXTERNAL_IP=$(kubectl get svc documentdb-service-my-documentdb -n default \ + -o jsonpath='{.status.loadBalancer.ingress[0].ip}') + mongosh "mongodb://:@$EXTERNAL_IP:10260/?directConnection=true" + ``` -# Get password -kubectl get secret documentdb-credentials -n default \ - -o jsonpath='{.data.password}' | base64 -d -``` +## Network Policies -### Connection String from Status +If your cluster uses restrictive [NetworkPolicies](https://kubernetes.io/docs/concepts/services-networking/network-policies/), ensure the following traffic is allowed: -The operator populates the connection string in the DocumentDB status: +| Traffic | From | To | Port | +|---------|------|----|------| +| Operator → Database | `documentdb-operator` namespace | DocumentDB pods | 8000, 5432 | +| Application → Gateway | Application namespace | DocumentDB pods | 10260 | +| Database replication | DocumentDB pods | DocumentDB pods | 5432 | -```bash -kubectl get documentdb my-documentdb -n default \ - -o jsonpath='{.status.connectionString}' -``` +See the [Kubernetes NetworkPolicy documentation](https://kubernetes.io/docs/concepts/services-networking/network-policies/) for examples. ## Troubleshooting -### Network Policies - -If your cluster has restrictive [NetworkPolicies](https://kubernetes.io/docs/concepts/services-networking/network-policies/), you must ensure the DocumentDB operator can reach the database pods, and that pods can communicate with each other. - -#### Allow Operator-to-Cluster Communication - -If you have a default-deny ingress policy, create an explicit policy to allow the operator: - -```yaml -apiVersion: networking.k8s.io/v1 -kind: NetworkPolicy -metadata: - name: allow-documentdb-operator - namespace: default # Namespace where DocumentDB runs -spec: - podSelector: - matchLabels: - app.kubernetes.io/name: documentdb - policyTypes: - - Ingress - ingress: - - from: - - namespaceSelector: - matchLabels: - kubernetes.io/metadata.name: documentdb-operator - ports: - - protocol: TCP - port: 8000 - - protocol: TCP - port: 5432 -``` - -#### Allow Application Access to DocumentDB - -Restrict database access to specific application namespaces: - -```yaml -apiVersion: networking.k8s.io/v1 -kind: NetworkPolicy -metadata: - name: allow-app-to-documentdb - namespace: default -spec: - podSelector: - matchLabels: - app.kubernetes.io/name: documentdb - policyTypes: - - Ingress - ingress: - - from: - - namespaceSelector: - matchLabels: - name: app-namespace - ports: - - protocol: TCP - port: 10260 -``` - -#### Allow Inter-Pod Communication - -DocumentDB instances must communicate with each other for replication: - -```yaml -apiVersion: networking.k8s.io/v1 -kind: NetworkPolicy -metadata: - name: allow-documentdb-internal - namespace: default -spec: - podSelector: - matchLabels: - app.kubernetes.io/name: documentdb - policyTypes: - - Ingress - ingress: - - from: - - podSelector: - matchLabels: - app.kubernetes.io/name: documentdb - ports: - - protocol: TCP - port: 5432 -``` - -### LoadBalancer Pending - -**Symptoms**: Service stays in `Pending` state, no external IP assigned. - -```bash -kubectl describe svc documentdb-service-my-documentdb -n default -``` - -**Common causes**: - -- Cloud provider quota exceeded for load balancers -- Missing permissions for the cloud controller manager -- Network configuration issues (subnet, security groups) - -### Connection Timeout - -**Symptoms**: Cannot connect to the external IP. - -```bash -# Verify the service has an endpoint -kubectl get endpoints documentdb-service-my-documentdb -n default - -# Check if the pod is running and ready -kubectl get pods -n default -l cnpg.io/instanceRole=primary -``` - -**Common causes**: - -- Firewall or network security group blocking port 10260 -- Pod is not ready (check pod events and logs) -- Service selector does not match any pods - -### DNS Resolution Failure - -**Symptoms**: In-cluster DNS name does not resolve. - -```bash -# Test DNS from within the cluster -kubectl run dns-test --image=busybox --rm -it -- nslookup \ - documentdb-service-my-documentdb.default.svc.cluster.local -``` - -**Common causes**: - -- CoreDNS is not running or misconfigured -- Incorrect namespace in the DNS name -- Service does not exist +| Problem | Common Causes | +|---------|---------------| +| **LoadBalancer stuck in Pending** | Cloud provider quota exceeded; missing cloud controller permissions; subnet/security group misconfiguration | +| **Connection timeout to external IP** | Firewall blocking port 10260; pod not ready; service selector mismatch | +| **In-cluster DNS not resolving** | CoreDNS not running; wrong namespace in DNS name; service does not exist | diff --git a/docs/operator-public-documentation/preview/configuration/resource-management.md b/docs/operator-public-documentation/preview/configuration/resource-management.md index 913e7189..8dc8499c 100644 --- a/docs/operator-public-documentation/preview/configuration/resource-management.md +++ b/docs/operator-public-documentation/preview/configuration/resource-management.md @@ -9,29 +9,14 @@ tags: # Resource Management -This guide covers CPU and memory sizing guidelines for DocumentDB deployments, including recommendations for different workload profiles, Kubernetes Quality of Service classes, and internal operator resource allocations. +CPU and memory sizing guidelines for DocumentDB deployments. ## Overview -DocumentDB runs on Kubernetes and leverages the underlying CloudNative-PG operator for resource management. Proper resource allocation ensures stable performance, prevents out-of-memory kills, and optimizes cost. +Proper resource allocation ensures stable performance and prevents pod eviction. For production, always configure explicit resource requests and limits. -!!! important - For production database workloads, always configure explicit resource requests and limits. Running without resource constraints risks pod eviction, OOM kills, and CPU throttling during peak load. - -## Kubernetes Quality of Service (QoS) - -Kubernetes assigns a QoS class to each pod based on its resource configuration. For database workloads, we recommend **Guaranteed** QoS: - -| QoS Class | Condition | Priority | Recommendation | -|-----------|-----------|----------|----------------| -| **Guaranteed** | Requests = Limits for all containers | Highest | **Recommended for production** | -| Burstable | Requests < Limits | Medium | Acceptable for development | -| Best-Effort | No requests or limits set | Lowest (evicted first) | Not recommended | - -To achieve **Guaranteed** QoS, set requests and limits to the same value for both CPU and memory. This ensures that your DocumentDB pods are the last to be evicted under memory pressure. - -!!! note - When QoS is set to Guaranteed, CloudNative-PG configures the PostgreSQL `postmaster` process with an OOM adjustment value of `0`, keeping its low OOM score of `-997`. If the OOM killer is triggered, child processes are terminated before the `postmaster`, allowing for a clean shutdown. This behavior helps keep the database instance alive as long as possible. +!!! tip "Guaranteed QoS" + Set resource requests equal to limits to achieve [Guaranteed QoS](https://kubernetes.io/docs/concepts/workloads/pods/pod-qos/#guaranteed). This gives your database pods the highest eviction priority — they are the last to be evicted under memory pressure. ## Sizing Guidelines @@ -39,8 +24,6 @@ To achieve **Guaranteed** QoS, set requests and limits to the same value for bot === "Development" - A minimal cluster for development and testing: - | Setting | Value | |---------|-------| | Instances | 1 | @@ -64,8 +47,6 @@ To achieve **Guaranteed** QoS, set requests and limits to the same value for bot === "Production" - A production-ready cluster with high availability: - | Setting | Value | |---------|-------| | Instances | 3 | @@ -81,28 +62,16 @@ To achieve **Guaranteed** QoS, set requests and limits to the same value for bot namespace: default spec: nodeCount: 1 - instancesPerNode: 3 # (1)! - environment: aks + instancesPerNode: 3 resource: storage: pvcSize: 100Gi storageClass: managed-csi-premium persistentVolumeReclaimPolicy: Retain - tls: - gateway: - mode: SelfSigned - exposeViaService: - serviceType: LoadBalancer - backup: - retentionDays: 30 ``` - 1. Three instances provide Guaranteed QoS with one primary and two replicas for automatic failover. - === "High-Load" - For workloads requiring maximum throughput: - | Setting | Value | |---------|-------| | Instances | 3 | @@ -119,70 +88,17 @@ To achieve **Guaranteed** QoS, set requests and limits to the same value for bot spec: nodeCount: 1 instancesPerNode: 3 - environment: aks resource: storage: pvcSize: 1Ti storageClass: managed-csi-premium persistentVolumeReclaimPolicy: Retain - tls: - gateway: - mode: SelfSigned - exposeViaService: - serviceType: LoadBalancer - backup: - retentionDays: 30 ``` -## Internal Operator Resources - -The operator allocates resources for internal processes such as SQL jobs (schema migrations, extension upgrades). These are pre-configured and not user-configurable. - -### SQL Job Resources - -| Resource | Request | Limit | -|----------|---------|-------| -| Memory | 32 Mi | 64 Mi | -| CPU | 10m | 50m | - -SQL jobs run as non-root (UID 1000) with privilege escalation disabled. - -## Monitoring Resource Usage - -### Pod-Level Metrics - -```bash -# View resource usage for DocumentDB pods -kubectl top pods -n default -l app.kubernetes.io/name=documentdb - -# View detailed resource requests and limits -kubectl describe pod -n default | grep -A 5 "Requests\|Limits" -``` - -### Cluster-Level Monitoring - -For comprehensive monitoring, consider setting up: - -- **Prometheus + Grafana**: See the [Telemetry Guide](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/telemetry/README.md) for setup instructions -- **Kubernetes Metrics Server**: Required for `kubectl top` commands -- **Cloud provider monitoring**: Azure Monitor, CloudWatch, or Google Cloud Monitoring - ## Best Practices -1. **Always use 3 instances for production** — Set `instancesPerNode: 3` for automatic failover and read scalability. - -2. **Use Guaranteed QoS for production** — Set resource requests equal to limits to prevent pod eviction under memory pressure. - -3. **Use premium storage** — SSDs provide significantly better I/O performance for database workloads. Benchmark storage before going to production. - -4. **Set the `Retain` reclaim policy** — Prevents accidental data loss. Clean up PVs manually after confirming data is safely backed up. - -5. **Monitor disk usage** — Expand storage proactively before reaching 80% capacity. See [Storage Configuration](storage.md) for volume expansion instructions. - -6. **Configure backups** — Set `spec.backup.retentionDays` and create a `ScheduledBackup` resource. See [API Reference](../api-reference.md) for details. - -7. **Enable TLS in production** — Use `SelfSigned` mode at minimum. See [TLS Configuration](tls.md) for options. - -8. **Set the environment field** — Specify `environment: aks`, `eks`, or `gke` to get cloud-optimized service annotations. See [Networking Configuration](networking.md). - -9. **Use dedicated nodes for databases** — Use `spec.affinity` with `nodeSelector` to schedule database pods on dedicated nodes, isolating them from other workloads. See the [CloudNative-PG scheduling documentation](https://cloudnative-pg.io/docs/1.28/scheduling/) for details. +1. **Use 3 instances for production** — `instancesPerNode: 3` enables automatic failover with one primary and two replicas. +2. **Use Guaranteed QoS** — Set resource requests equal to limits to prevent pod eviction under memory pressure. +3. **Use premium storage** — SSDs provide significantly better I/O for database workloads. +4. **Set `Retain` reclaim policy** — Prevents accidental data loss. See [Storage Configuration](storage.md). +5. **Monitor disk usage** — Expand storage proactively before reaching 80% capacity. diff --git a/docs/operator-public-documentation/preview/configuration/storage.md b/docs/operator-public-documentation/preview/configuration/storage.md index c029814f..0cf7f994 100644 --- a/docs/operator-public-documentation/preview/configuration/storage.md +++ b/docs/operator-public-documentation/preview/configuration/storage.md @@ -9,13 +9,11 @@ tags: # Storage Configuration -This guide covers storage configuration for the DocumentDB Kubernetes Operator, including storage classes, PVC sizing, volume expansion, reclaim policies, security hardening, and disk encryption across cloud providers. +Configure persistent storage for DocumentDB clusters. ## Overview -DocumentDB uses Kubernetes PersistentVolumeClaims (PVCs) for database storage. The operator manages PersistentVolumes with security-hardened settings and provides flexible configuration for different environments. - -### Storage Fields +DocumentDB uses Kubernetes PersistentVolumeClaims (PVCs) for database storage. The operator manages volumes with security-hardened settings. ```yaml spec: @@ -26,295 +24,67 @@ spec: persistentVolumeReclaimPolicy: Retain # Optional: Retain or Delete ``` -| Field | Type | Required | Default | Description | -|-------|------|----------|---------|-------------| -| `pvcSize` | string | Yes | — | Size of the PersistentVolumeClaim (for example, `10Gi`, `100Gi`, `1Ti`). | -| `storageClass` | string | No | Cluster default | Kubernetes StorageClass name. If omitted, the cluster's default StorageClass is used. | -| `persistentVolumeReclaimPolicy` | string | No | `Retain` | What happens to the PersistentVolume when the PVC is deleted. Options: `Retain`, `Delete`. | - -For the full auto-generated type reference, see [StorageConfiguration](../api-reference.md#storageconfiguration) in the API Reference. +For the full field reference, see [StorageConfiguration](../api-reference.md#storageconfiguration) in the API Reference. ## Storage Classes ### Recommended Storage Classes by Provider -| Provider | StorageClass | Provisioner | Notes | -|----------|-------------|-------------|-------| -| **AKS** | `managed-csi-premium` | `disk.csi.azure.com` | Azure Premium SSD v2. Recommended for production. | -| **EKS** | `gp3` | `ebs.csi.aws.com` | AWS GP3 EBS volumes. Good balance of price and performance. | -| **GKE** | `premium-rwo` | `pd.csi.storage.gke.io` | Google SSD persistent disk. | -| **Kind** | `standard` (default) | `rancher.io/local-path` | Local path provisioner. Development only. | -| **Minikube** | `standard` (default) | `k8s.io/minikube-hostpath` | Host path provisioner. Development only. | - -### Using a Specific Storage Class - -=== "AKS" - - ```yaml title="documentdb-aks-storage.yaml" - apiVersion: documentdb.io/preview - kind: DocumentDB - metadata: - name: my-documentdb - namespace: default - spec: - nodeCount: 1 - instancesPerNode: 3 - environment: aks - resource: - storage: - pvcSize: 100Gi - storageClass: managed-csi-premium # (1)! - ``` - - 1. Azure Premium SSD v2 via `disk.csi.azure.com`. Recommended for production workloads. - -=== "EKS" - - ```yaml title="documentdb-eks-storage.yaml" - apiVersion: documentdb.io/preview - kind: DocumentDB - metadata: - name: my-documentdb - namespace: default - spec: - nodeCount: 1 - instancesPerNode: 3 - environment: eks - resource: - storage: - pvcSize: 100Gi - storageClass: gp3 # (1)! - ``` - - 1. AWS GP3 EBS volumes. Good balance of price and performance. - -=== "GKE" +| Provider | StorageClass | Notes | +|----------|-------------|-------| +| **AKS** | `managed-csi-premium` | Azure Premium SSD v2. Recommended for production. | +| **EKS** | `gp3` | AWS GP3 EBS. Good balance of price and performance. | +| **GKE** | `premium-rwo` | Google SSD persistent disk. | +| **Kind / Minikube** | `standard` (default) | Development only. | - ```yaml title="documentdb-gke-storage.yaml" - apiVersion: documentdb.io/preview - kind: DocumentDB - metadata: - name: my-documentdb - namespace: default - spec: - nodeCount: 1 - instancesPerNode: 3 - environment: gke - resource: - storage: - pvcSize: 100Gi - storageClass: premium-rwo # (1)! - ``` +### Example - 1. Google SSD persistent disk via `pd.csi.storage.gke.io`. - -### Verifying Available Storage Classes - -```bash -# List all storage classes -kubectl get storageclass - -# Check default storage class -kubectl get storageclass -o jsonpath='{range .items[?(@.metadata.annotations.storageclass\.kubernetes\.io/is-default-class=="true")]}{.metadata.name}{"\n"}{end}' - -# Check if a storage class supports volume expansion -kubectl get storageclass -o jsonpath='{.allowVolumeExpansion}' +```yaml +spec: + resource: + storage: + pvcSize: 100Gi + storageClass: managed-csi-premium # Replace with your provider's class ``` -## Benchmarking Storage - -!!! important - Before deploying DocumentDB to production, benchmark your storage to set clear performance expectations. Database workloads are highly sensitive to I/O latency and throughput. - -We recommend a two-level benchmarking approach: - -1. **Storage-level**: Use [fio](https://fio.readthedocs.io/en/latest/fio_doc.html) to measure raw throughput for sequential reads, sequential writes, random reads, and random writes. -2. **Database-level**: Run representative workloads against your DocumentDB cluster to measure end-to-end performance. - -Know your storage characteristics — these baselines are invaluable during capacity planning and incident response. - -## Block Storage Considerations (Ceph/Longhorn) - -Most block storage solutions in Kubernetes, such as Longhorn and Ceph, recommend having multiple replicas of a volume to enhance resiliency. This works well for workloads without built-in replication. - -However, DocumentDB (via CloudNative-PG) provides its own replication through multiple instances. Combining storage-level replication with database-level replication can result in: - -- **Unnecessary I/O amplification** — Every write is replicated at both the storage and database layers -- **Increased latency** — Additional replication hops add latency to write operations -- **Higher cost** — Storage capacity is multiplied by the storage replica factor - -!!! tip - For DocumentDB clusters with `instancesPerNode: 3`, consider using storage with a single replica (or no replication) since the database already handles data redundancy. Consult your storage provider's documentation for configuration details. - ## PVC Sizing -### Sizing Guidelines - -| Workload | Recommended Size | Notes | -|----------|-----------------|-------| -| Development / Testing | 10–20 Gi | Minimal data, local clusters | -| Small production | 50–100 Gi | Light workloads, < 50 GB data | -| Medium production | 100–500 Gi | Moderate workloads, 50–200 GB data | -| Large production | 500 Gi–2 Ti | Heavy workloads, > 200 GB data | +| Workload | Recommended Size | +|----------|-----------------| +| Development / Testing | 10–20 Gi | +| Small production | 50–100 Gi | +| Medium production | 100–500 Gi | +| Large production | 500 Gi–2 Ti | !!! tip - Provision at least **2x** your expected data size to allow for WAL files, temporary files, and growth. Monitor disk usage and expand before reaching 80% capacity. + Provision at least **2x** your expected data size to allow for WAL files, temporary files, and growth. ## Volume Expansion -You can increase PVC size without downtime if the StorageClass supports volume expansion. - -### Prerequisites - -Verify that your StorageClass has `allowVolumeExpansion: true`: - -```bash -kubectl get storageclass -o yaml | grep allowVolumeExpansion -``` - -### Expanding Storage - -Update the `pvcSize` field in your DocumentDB spec: +You can increase PVC size without downtime if the StorageClass supports volume expansion (`allowVolumeExpansion: true`). ```bash kubectl patch documentdb my-documentdb -n default --type='json' \ -p='[{"op": "replace", "path": "/spec/resource/storage/pvcSize", "value": "200Gi"}]' ``` -Or edit the resource directly: - -```bash -kubectl edit documentdb my-documentdb -n default -``` - !!! warning Volume expansion is a one-way operation. You cannot shrink a PVC after expanding it. ## Reclaim Policy -The reclaim policy determines what happens to PersistentVolumes when the associated PVC is deleted. - -| Policy | Behavior | Use Case | -|--------|----------|----------| -| `Retain` (default) | PV is preserved after PVC deletion. Data remains on disk. | Production. Protects against accidental data loss. | -| `Delete` | PV and underlying storage are deleted with the PVC. | Development and testing. Automatic cleanup. | - -```yaml -spec: - resource: - storage: - pvcSize: 100Gi - persistentVolumeReclaimPolicy: Retain # Recommended for production -``` - -!!! note - The operator defaults to `Retain` to prevent accidental data loss. For production workloads, always use `Retain` and manually clean up PVs after confirming the data is no longer needed. - -## PersistentVolume Security - -The operator automatically applies security-hardening mount options to all PersistentVolumes associated with DocumentDB clusters: - -| Mount Option | Description | -|-------------|-------------| -| `nodev` | Prevents device files from being interpreted on the filesystem | -| `nosuid` | Prevents setuid/setgid bits from taking effect | -| `noexec` | Prevents execution of binaries on the filesystem | - -These options are applied automatically by the PV controller and require no configuration. They are compatible with major cloud storage provisioners (Azure Disk, AWS EBS, GCE PD). - -!!! note - Local-path provisioners used by Kind (`rancher.io/local-path`) and Minikube (`k8s.io/minikube-hostpath`) do not support mount options. The operator auto-detects these provisioners and skips applying mount options. +| Policy | Behavior | +|--------|----------| +| `Retain` (default) | PV is preserved after PVC deletion. **Recommended for production.** | +| `Delete` | PV and underlying storage are deleted with the PVC. Suitable for development. | ## Disk Encryption -Encryption at rest protects sensitive database data stored on disk. Configuration varies by cloud provider. - -=== "AKS (Azure)" - - AKS encrypts all managed disks by default using Azure Storage Service Encryption (SSE) with platform-managed keys. **No additional configuration is required.** - - For customer-managed keys (CMK), create a StorageClass with a Disk Encryption Set: - - ```yaml title="storageclass-aks-encrypted.yaml" - apiVersion: storage.k8s.io/v1 - kind: StorageClass - metadata: - name: managed-csi-encrypted - provisioner: disk.csi.azure.com - parameters: - skuName: Premium_LRS - diskEncryptionSetID: /subscriptions//resourceGroups//providers/Microsoft.Compute/diskEncryptionSets/ - reclaimPolicy: Delete - volumeBindingMode: WaitForFirstConsumer - allowVolumeExpansion: true - ``` - -=== "GKE (Google Cloud)" - - GKE encrypts all persistent disks by default using Google-managed encryption keys. **No additional configuration is required.** - - For customer-managed encryption keys (CMEK): - - ```yaml title="storageclass-gke-encrypted.yaml" - apiVersion: storage.k8s.io/v1 - kind: StorageClass - metadata: - name: pd-ssd-encrypted - provisioner: pd.csi.storage.gke.io - parameters: - type: pd-ssd - disk-encryption-kms-key: projects//locations//keyRings//cryptoKeys/ - reclaimPolicy: Delete - volumeBindingMode: WaitForFirstConsumer - allowVolumeExpansion: true - ``` - -=== "EKS (AWS)" - - !!! warning - Unlike AKS and GKE, EBS volumes on EKS are **not encrypted by default**. You must explicitly enable encryption. - - ```yaml title="storageclass-eks-encrypted.yaml" - apiVersion: storage.k8s.io/v1 - kind: StorageClass - metadata: - name: ebs-sc-encrypted - provisioner: ebs.csi.aws.com - parameters: - type: gp3 - encrypted: "true" - # Optional: specify a KMS key for customer-managed encryption - # kmsKeyId: arn:aws:kms:::key/ - reclaimPolicy: Delete - volumeBindingMode: WaitForFirstConsumer - allowVolumeExpansion: true - ``` - - Then reference the encrypted StorageClass in your DocumentDB spec: - - ```yaml - apiVersion: documentdb.io/preview - kind: DocumentDB - metadata: - name: my-documentdb - namespace: default - spec: - environment: eks - nodeCount: 1 - instancesPerNode: 3 - resource: - storage: - pvcSize: 100Gi - storageClass: ebs-sc-encrypted - ``` - -### Encryption Summary - | Provider | Default Encryption | Customer-Managed Keys | |----------|-------------------|----------------------| -| **AKS** | ✅ Enabled (SSE with platform keys) | Optional via DiskEncryptionSet | -| **GKE** | ✅ Enabled (Google-managed keys) | Optional via CMEK | -| **EKS** | ❌ **Not enabled** | Required: set `encrypted: "true"` in StorageClass | +| **AKS** | ✅ Enabled (platform-managed keys) | Optional via [DiskEncryptionSet](https://learn.microsoft.com/azure/aks/azure-disk-customer-managed-keys) | +| **GKE** | ✅ Enabled (Google-managed keys) | Optional via [CMEK](https://cloud.google.com/kubernetes-engine/docs/how-to/using-cmek) | +| **EKS** | ❌ **Not enabled by default** | Set `encrypted: "true"` in [StorageClass](https://docs.aws.amazon.com/eks/latest/userguide/ebs-csi.html) | -!!! tip - For production deployments on EKS, always create a StorageClass with `encrypted: "true"` to ensure data at rest is protected. +!!! warning + For production on EKS, always create a StorageClass with `encrypted: "true"` to ensure data at rest is protected. diff --git a/docs/operator-public-documentation/preview/configuration/tls.md b/docs/operator-public-documentation/preview/configuration/tls.md index 2fe99070..5fab07e8 100644 --- a/docs/operator-public-documentation/preview/configuration/tls.md +++ b/docs/operator-public-documentation/preview/configuration/tls.md @@ -75,14 +75,10 @@ Select your TLS mode below. Each tab shows prerequisites, the complete YAML conf pvcSize: 10Gi tls: gateway: - mode: SelfSigned # (1)! + mode: SelfSigned ``` - The operator handles CA and certificate generation automatically: - 1. Create a self-signed CA Issuer - 2. Generate a CA certificate - 3. Create a server certificate signed by the CA - 4. Mount the certificate in the gateway pod + The operator automatically creates a self-signed CA, generates a server certificate, and mounts it in the gateway pod. Connect with TLS using the CA certificate: @@ -103,8 +99,9 @@ Select your TLS mode below. Each tab shows prerequisites, the complete YAML conf !!! note "Prerequisites" [cert-manager](https://cert-manager.io/) must be installed (see [Install cert-manager](../index.md#install-cert-manager)), plus a configured [Issuer or ClusterIssuer](https://cert-manager.io/docs/concepts/issuer/). - CertManager mode lets you use your own cert-manager Issuer(namespace-scoped) or ClusterIssuer (cluster-scoped) to issue TLS certificates for the DocumentDB gateway. This is ideal for production environments that already have PKI infrastructure (for example, [Let's Encrypt](https://letsencrypt.org/), or a corporate CA). + CertManager mode lets you use your own cert-manager [Issuer](https://cert-manager.io/docs/concepts/issuer/#namespaces) (namespace-scoped) or [ClusterIssuer](https://cert-manager.io/docs/concepts/issuer/) (cluster-scoped) to issue TLS certificates for the DocumentDB gateway. This is ideal for production environments that already have PKI infrastructure (for example, a corporate CA). + Set `issuerRef.name` and `issuerRef.kind` to match your Issuer or ClusterIssuer. The operator will automatically request a certificate and mount it in the gateway. ```yaml title="documentdb-tls-certmanager.yaml" apiVersion: documentdb.io/preview @@ -143,7 +140,7 @@ Select your TLS mode below. Each tab shows prerequisites, the complete YAML conf **Best for:** Production with centralized certificate management !!! note "Prerequisites" - A Kubernetes TLS Secret containing `tls.crt`, `tls.key`, and `ca.crt`. + A Kubernetes [TLS Secret](https://kubernetes.io/docs/concepts/configuration/secret/#tls-secrets) containing `tls.crt` and `tls.key` (and optionally `ca.crt`). Provided mode lets you supply your own TLS certificates. This is ideal when certificates are managed externally (for example, from Azure Key Vault, HashiCorp Vault, or a corporate CA). @@ -153,7 +150,7 @@ Select your TLS mode below. Each tab shows prerequisites, the complete YAML conf kubectl create secret generic my-documentdb-tls -n default \ --from-file=tls.crt=server.crt \ --from-file=tls.key=server.key \ - --from-file=ca.crt=ca.crt + --from-file=ca.crt=ca.crt # optional: include if clients need CA verification ``` Then reference the secret in your DocumentDB configuration: @@ -177,7 +174,7 @@ Select your TLS mode below. Each tab shows prerequisites, the complete YAML conf secretName: my-documentdb-tls # (1)! ``` - 1. The Secret must contain three keys: `tls.crt` (server certificate), `tls.key` (private key), and `ca.crt` (CA certificate). + 1. The Secret must contain `tls.crt` (server certificate) and `tls.key` (private key). Optionally include `ca.crt` (CA certificate) if clients need to verify the server. For Azure Key Vault integration, see the [Manual Provided Mode Setup Guide](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/tls/MANUAL-PROVIDED-MODE-SETUP.md). @@ -258,36 +255,10 @@ kubectl logs -n -c gateway ### Azure Key Vault Access Denied (Provided Mode) -**Symptoms**: Secret is not synced from Azure Key Vault. - -```bash -# Check SecretProviderClass status -kubectl describe secretproviderclass -n - -# Check CSI driver pods -kubectl get pods -n kube-system -l app=secrets-store-csi-driver -``` - -**Common causes**: - -- Managed identity does not have `Key Vault Secrets User` role on the Key Vault -- The Key Vault firewall is blocking access from the AKS cluster -- The CSI driver addon is not enabled on the cluster - -## Security Context - -The DocumentDB gateway runs with a hardened security context: - -- **Non-root execution**: All containers run as non-root users -- **No privilege escalation**: `allowPrivilegeEscalation: false` -- **Read-only root filesystem**: Where applicable - -TLS certificates are mounted as read-only volumes into the gateway container. The operator manages certificate lifecycle without requiring elevated privileges. +**Symptoms**: Secret is not synced from Azure Key Vault. See the [Manual Provided Mode Setup Guide](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/tls/MANUAL-PROVIDED-MODE-SETUP.md) for troubleshooting. ## Additional Resources -- [Complete TLS Setup Guide](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/tls/README.md) — Automated scripts for TLS setup -- [E2E Testing Guide](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/tls/E2E-TESTING.md) — Automated TLS testing -- [Manual Provided Mode Setup](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/tls/MANUAL-PROVIDED-MODE-SETUP.md) — Step-by-step Azure Key Vault integration -- [API Reference — TLS Types](../api-reference.md#tlsconfiguration) — Auto-generated reference for TLSConfiguration, GatewayTLS, and related types +- [API Reference — TLS Types](../api-reference.md#tlsconfiguration) — Full field reference for TLSConfiguration and GatewayTLS +- [TLS Setup Scripts](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/tls/README.md) — Automated setup and E2E testing - [cert-manager Documentation](https://cert-manager.io/docs/) From f60bbb9c157c8d31a327026345ac68928e148519 Mon Sep 17 00:00:00 2001 From: Wenting Wu Date: Mon, 9 Mar 2026 14:52:47 -0400 Subject: [PATCH 4/9] delete resource-management.md Signed-off-by: Wenting Wu --- .../preview/advanced-configuration/README.md | 2 - .../configuration/resource-management.md | 104 ------------------ .../preview/index.md | 2 +- mkdocs.yml | 1 - 4 files changed, 1 insertion(+), 108 deletions(-) delete mode 100644 docs/operator-public-documentation/preview/configuration/resource-management.md diff --git a/docs/operator-public-documentation/preview/advanced-configuration/README.md b/docs/operator-public-documentation/preview/advanced-configuration/README.md index fe08f891..533fde6f 100644 --- a/docs/operator-public-documentation/preview/advanced-configuration/README.md +++ b/docs/operator-public-documentation/preview/advanced-configuration/README.md @@ -8,8 +8,6 @@ For core configuration topics, see the [Configuration](../configuration/tls.md) - [TLS](../configuration/tls.md) — TLS modes, certificate rotation, and troubleshooting - [Storage](../configuration/storage.md) — Storage classes, PVC sizing, encryption - [Networking](../configuration/networking.md) — Service types, load balancers, Network Policies -- [Resource Management](../configuration/resource-management.md) — CPU and memory sizing - ## Table of Contents - [High Availability](#high-availability) diff --git a/docs/operator-public-documentation/preview/configuration/resource-management.md b/docs/operator-public-documentation/preview/configuration/resource-management.md deleted file mode 100644 index 8dc8499c..00000000 --- a/docs/operator-public-documentation/preview/configuration/resource-management.md +++ /dev/null @@ -1,104 +0,0 @@ ---- -title: Resource Management -description: CPU and memory sizing guidelines for DocumentDB deployments including Kubernetes QoS classes, workload profiles, and monitoring recommendations. -tags: - - configuration - - resources - - performance ---- - -# Resource Management - -CPU and memory sizing guidelines for DocumentDB deployments. - -## Overview - -Proper resource allocation ensures stable performance and prevents pod eviction. For production, always configure explicit resource requests and limits. - -!!! tip "Guaranteed QoS" - Set resource requests equal to limits to achieve [Guaranteed QoS](https://kubernetes.io/docs/concepts/workloads/pods/pod-qos/#guaranteed). This gives your database pods the highest eviction priority — they are the last to be evicted under memory pressure. - -## Sizing Guidelines - -### Workload Profiles - -=== "Development" - - | Setting | Value | - |---------|-------| - | Instances | 1 | - | CPU | 1 (default) | - | Memory | 2 Gi (default) | - | Storage | 10–20 Gi | - - ```yaml title="documentdb-dev.yaml" - apiVersion: documentdb.io/preview - kind: DocumentDB - metadata: - name: dev-documentdb - namespace: default - spec: - nodeCount: 1 - instancesPerNode: 1 - resource: - storage: - pvcSize: 10Gi - ``` - -=== "Production" - - | Setting | Value | - |---------|-------| - | Instances | 3 | - | CPU | 2–4 CPUs | - | Memory | 4–8 Gi | - | Storage | 100–500 Gi | - - ```yaml title="documentdb-prod.yaml" - apiVersion: documentdb.io/preview - kind: DocumentDB - metadata: - name: prod-documentdb - namespace: default - spec: - nodeCount: 1 - instancesPerNode: 3 - resource: - storage: - pvcSize: 100Gi - storageClass: managed-csi-premium - persistentVolumeReclaimPolicy: Retain - ``` - -=== "High-Load" - - | Setting | Value | - |---------|-------| - | Instances | 3 | - | CPU | 4–8 CPUs | - | Memory | 8–16 Gi | - | Storage | 500 Gi–2 Ti | - - ```yaml title="documentdb-highload.yaml" - apiVersion: documentdb.io/preview - kind: DocumentDB - metadata: - name: highload-documentdb - namespace: default - spec: - nodeCount: 1 - instancesPerNode: 3 - resource: - storage: - pvcSize: 1Ti - storageClass: managed-csi-premium - persistentVolumeReclaimPolicy: Retain - ``` - -## Best Practices - -1. **Use 3 instances for production** — `instancesPerNode: 3` enables automatic failover with one primary and two replicas. -2. **Use Guaranteed QoS** — Set resource requests equal to limits to prevent pod eviction under memory pressure. -3. **Use premium storage** — SSDs provide significantly better I/O for database workloads. -4. **Set `Retain` reclaim policy** — Prevents accidental data loss. See [Storage Configuration](storage.md). -5. **Monitor disk usage** — Expand storage proactively before reaching 80% capacity. diff --git a/docs/operator-public-documentation/preview/index.md b/docs/operator-public-documentation/preview/index.md index f4afc39b..6461cc66 100644 --- a/docs/operator-public-documentation/preview/index.md +++ b/docs/operator-public-documentation/preview/index.md @@ -381,7 +381,7 @@ spec: instancesPerNode: 3 # 1 primary + 2 replicas ``` -For sizing guidelines and workload profiles, see [Resource Management](configuration/resource-management.md). For the full field reference, see the [API Reference](api-reference.md#documentdbspec). +For the full field reference, see the [API Reference](api-reference.md#documentdbspec). ### Multi-cloud deployment diff --git a/mkdocs.yml b/mkdocs.yml index de3f69b1..e9d13815 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -32,7 +32,6 @@ nav: - TLS: preview/configuration/tls.md - Storage: preview/configuration/storage.md - Networking: preview/configuration/networking.md - - Resource Management: preview/configuration/resource-management.md - Advanced Configuration: preview/advanced-configuration/README.md - Backup and Restore: preview/backup-and-restore.md - API Reference: preview/api-reference.md From 3214b93f7a74f3c8f7d3fda745504bd545f328cf Mon Sep 17 00:00:00 2001 From: Wenting Wu Date: Mon, 9 Mar 2026 17:42:32 -0400 Subject: [PATCH 5/9] docs: standardize configuration docs and fix storage accuracy - Standardize all config docs: merge intro into Overview, add spec quick-look with kind: DocumentDB, link to API reference - Storage: reorder sections (PVC Sizing > Reclaim Policy > Storage Classes > Disk Encryption), add PV/PVC concept links, link to backup-and-restore for retained PV recovery - Storage: clarify PVC resize not yet supported (see #298), remove unverified resize-without-downtime claim and generic sizing table - Storage: explain StorageClass concept with link, show kubectl command to find default, clarify default behavior - TLS: add spec quick-look in Overview - Networking: add spec quick-look in Overview Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Wenting Wu --- .../preview/configuration/networking.md | 13 +++- .../preview/configuration/storage.md | 71 ++++++++----------- .../preview/configuration/tls.md | 17 ++++- 3 files changed, 55 insertions(+), 46 deletions(-) diff --git a/docs/operator-public-documentation/preview/configuration/networking.md b/docs/operator-public-documentation/preview/configuration/networking.md index 2b422148..8d42a213 100644 --- a/docs/operator-public-documentation/preview/configuration/networking.md +++ b/docs/operator-public-documentation/preview/configuration/networking.md @@ -9,12 +9,21 @@ tags: # Networking Configuration -Configure how clients connect to your DocumentDB cluster. - ## Overview DocumentDB exposes connectivity through a Kubernetes Service named `documentdb-service-`. The gateway listens on port **10260** (MongoDB-compatible wire protocol). +```yaml +apiVersion: documentdb.io/preview +kind: DocumentDB +metadata: + name: my-documentdb +spec: + environment: aks # Optional: aks | eks | gke + exposeViaService: + serviceType: LoadBalancer # ClusterIP (default) | LoadBalancer +``` + For the full field reference, see [ExposeViaService](../api-reference.md#exposeviaservice) in the API Reference. ## Service Types diff --git a/docs/operator-public-documentation/preview/configuration/storage.md b/docs/operator-public-documentation/preview/configuration/storage.md index 0cf7f994..381d661b 100644 --- a/docs/operator-public-documentation/preview/configuration/storage.md +++ b/docs/operator-public-documentation/preview/configuration/storage.md @@ -9,74 +9,63 @@ tags: # Storage Configuration -Configure persistent storage for DocumentDB clusters. - ## Overview -DocumentDB uses Kubernetes PersistentVolumeClaims (PVCs) for database storage. The operator manages volumes with security-hardened settings. +DocumentDB uses Kubernetes [PersistentVolumeClaims (PVCs)](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#persistentvolumeclaims) to request storage, which are backed by [PersistentVolumes (PVs)](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) provisioned by your cloud provider. ```yaml +apiVersion: documentdb.io/preview +kind: DocumentDB +metadata: + name: my-documentdb spec: resource: storage: pvcSize: 100Gi # Required: storage size - storageClass: managed-csi-premium # Optional: StorageClass name + storageClass: managed-csi-premium # Optional: defaults to Kubernetes default StorageClass persistentVolumeReclaimPolicy: Retain # Optional: Retain or Delete ``` For the full field reference, see [StorageConfiguration](../api-reference.md#storageconfiguration) in the API Reference. -## Storage Classes - -### Recommended Storage Classes by Provider +## PVC Sizing -| Provider | StorageClass | Notes | -|----------|-------------|-------| -| **AKS** | `managed-csi-premium` | Azure Premium SSD v2. Recommended for production. | -| **EKS** | `gp3` | AWS GP3 EBS. Good balance of price and performance. | -| **GKE** | `premium-rwo` | Google SSD persistent disk. | -| **Kind / Minikube** | `standard` (default) | Development only. | +!!! tip + Provision at least **2x** your expected data size to allow for WAL files, temporary files, and growth. -### Example +!!! warning + PVC size is set at cluster creation time. Resizing an existing PVC by updating `pvcSize` is **not yet supported** — the change will be accepted but not applied. See [#298](https://github.com/documentdb/documentdb-kubernetes-operator/issues/298) for tracking. -```yaml -spec: - resource: - storage: - pvcSize: 100Gi - storageClass: managed-csi-premium # Replace with your provider's class -``` +## Reclaim Policy -## PVC Sizing +| Policy | Behavior | +|--------|----------| +| `Retain` (default) | PV is preserved after PVC deletion. **Recommended for production.** | +| `Delete` | PV and underlying storage are deleted with the PVC. Suitable for development. | -| Workload | Recommended Size | -|----------|-----------------| -| Development / Testing | 10–20 Gi | -| Small production | 50–100 Gi | -| Medium production | 100–500 Gi | -| Large production | 500 Gi–2 Ti | +With `Retain`, you can recover data from a retained PV after cluster deletion. See [PersistentVolume Retention and Recovery](../backup-and-restore.md#persistentvolume-retention-and-recovery) for restore steps. -!!! tip - Provision at least **2x** your expected data size to allow for WAL files, temporary files, and growth. +## Storage Classes -## Volume Expansion +A [StorageClass](https://kubernetes.io/docs/concepts/storage/storage-classes/) defines the type of underlying disk (e.g., SSD vs HDD) and provisioner used for persistent volumes. If you don't specify one, Kubernetes uses the default StorageClass in your cluster. -You can increase PVC size without downtime if the StorageClass supports volume expansion (`allowVolumeExpansion: true`). +To see available StorageClasses and which one is the default: ```bash -kubectl patch documentdb my-documentdb -n default --type='json' \ - -p='[{"op": "replace", "path": "/spec/resource/storage/pvcSize", "value": "200Gi"}]' +kubectl get storageclass ``` -!!! warning - Volume expansion is a one-way operation. You cannot shrink a PVC after expanding it. +The default is marked with `(default)` in the output. -## Reclaim Policy +### Recommended Storage Classes by Provider -| Policy | Behavior | -|--------|----------| -| `Retain` (default) | PV is preserved after PVC deletion. **Recommended for production.** | -| `Delete` | PV and underlying storage are deleted with the PVC. Suitable for development. | +For production database workloads, use an SSD-backed StorageClass: + +| Provider | StorageClass | Notes | +|----------|-------------|-------| +| **AKS** | `managed-csi-premium` | Azure Premium SSD v2. | +| **EKS** | `gp3` | AWS GP3 EBS. | +| **GKE** | `premium-rwo` | Google SSD persistent disk. | ## Disk Encryption diff --git a/docs/operator-public-documentation/preview/configuration/tls.md b/docs/operator-public-documentation/preview/configuration/tls.md index 5fab07e8..28915df7 100644 --- a/docs/operator-public-documentation/preview/configuration/tls.md +++ b/docs/operator-public-documentation/preview/configuration/tls.md @@ -9,11 +9,22 @@ tags: # TLS Configuration -This guide covers TLS configuration for the DocumentDB Kubernetes Operator, including all supported modes, certificate management, rotation, monitoring, and troubleshooting. - ## Overview -The DocumentDB operator supports TLS encryption for gateway connections via the `spec.tls` configuration. TLS protects data in transit between clients and the DocumentDB gateway. +The DocumentDB operator supports TLS encryption for gateway connections. TLS protects data in transit between clients and the DocumentDB gateway. + +```yaml +apiVersion: documentdb.io/preview +kind: DocumentDB +metadata: + name: my-documentdb +spec: + tls: + gateway: + mode: SelfSigned # Disabled (default) | SelfSigned | CertManager | Provided +``` + +For the full field reference, see [TLSConfiguration](../api-reference.md#tlsconfiguration) in the API Reference. ## Configuration From 36150063486394934cc0a122b2c233b590d62485 Mon Sep 17 00:00:00 2001 From: Wenting Wu Date: Tue, 10 Mar 2026 22:27:32 -0400 Subject: [PATCH 6/9] docs: improve Signed-off-by: Wenting Wu --- .../preview/configuration/storage.md | 49 +++-- .../preview/configuration/tls.md | 173 +++++++++--------- 2 files changed, 120 insertions(+), 102 deletions(-) diff --git a/docs/operator-public-documentation/preview/configuration/storage.md b/docs/operator-public-documentation/preview/configuration/storage.md index 381d661b..0193a022 100644 --- a/docs/operator-public-documentation/preview/configuration/storage.md +++ b/docs/operator-public-documentation/preview/configuration/storage.md @@ -23,18 +23,14 @@ spec: storage: pvcSize: 100Gi # Required: storage size storageClass: managed-csi-premium # Optional: defaults to Kubernetes default StorageClass - persistentVolumeReclaimPolicy: Retain # Optional: Retain or Delete + persistentVolumeReclaimPolicy: Retain # Optional: Retain (default) or Delete ``` For the full field reference, see [StorageConfiguration](../api-reference.md#storageconfiguration) in the API Reference. ## PVC Sizing -!!! tip - Provision at least **2x** your expected data size to allow for WAL files, temporary files, and growth. - -!!! warning - PVC size is set at cluster creation time. Resizing an existing PVC by updating `pvcSize` is **not yet supported** — the change will be accepted but not applied. See [#298](https://github.com/documentdb/documentdb-kubernetes-operator/issues/298) for tracking. +PVC size is set at cluster creation time. Online PVC resizing is **coming soon** — see [#298](https://github.com/documentdb/documentdb-kubernetes-operator/issues/298) for tracking. ## Reclaim Policy @@ -57,23 +53,38 @@ kubectl get storageclass The default is marked with `(default)` in the output. -### Recommended Storage Classes by Provider - -For production database workloads, use an SSD-backed StorageClass: - -| Provider | StorageClass | Notes | -|----------|-------------|-------| -| **AKS** | `managed-csi-premium` | Azure Premium SSD v2. | -| **EKS** | `gp3` | AWS GP3 EBS. | -| **GKE** | `premium-rwo` | Google SSD persistent disk. | - ## Disk Encryption | Provider | Default Encryption | Customer-Managed Keys | |----------|-------------------|----------------------| -| **AKS** | ✅ Enabled (platform-managed keys) | Optional via [DiskEncryptionSet](https://learn.microsoft.com/azure/aks/azure-disk-customer-managed-keys) | -| **GKE** | ✅ Enabled (Google-managed keys) | Optional via [CMEK](https://cloud.google.com/kubernetes-engine/docs/how-to/using-cmek) | -| **EKS** | ❌ **Not enabled by default** | Set `encrypted: "true"` in [StorageClass](https://docs.aws.amazon.com/eks/latest/userguide/ebs-csi.html) | +| **AKS** | ✅ Enabled (platform-managed keys) | [Azure Disk Encryption with CMK](https://learn.microsoft.com/azure/aks/azure-disk-customer-managed-keys) | +| **GKE** | ✅ Enabled (Google-managed keys) | [CMEK for GKE persistent disks](https://cloud.google.com/kubernetes-engine/docs/how-to/using-cmek) | +| **EKS** | ❌ **Not enabled by default** | [EBS CSI driver encryption](https://docs.aws.amazon.com/eks/latest/userguide/ebs-csi.html) — set `encrypted: "true"` in StorageClass | !!! warning For production on EKS, always create a StorageClass with `encrypted: "true"` to ensure data at rest is protected. + + ```yaml + apiVersion: storage.k8s.io/v1 + kind: StorageClass + metadata: + name: ebs-sc-encrypted + provisioner: ebs.csi.aws.com + parameters: + type: gp3 + encrypted: "true" + # kmsKeyId: arn:aws:kms:::key/ # Optional: customer-managed key + reclaimPolicy: Delete + volumeBindingMode: WaitForFirstConsumer + allowVolumeExpansion: true + ``` + +## PersistentVolume Security + +The operator automatically applies security-hardening mount options to all PersistentVolumes associated with DocumentDB clusters: + +| Mount Option | Description | +|--------------|-------------| +| `nodev` | Prevents device files from being interpreted on the filesystem | +| `nosuid` | Prevents setuid/setgid bits from taking effect | +| `noexec` | Prevents execution of binaries on the filesystem | diff --git a/docs/operator-public-documentation/preview/configuration/tls.md b/docs/operator-public-documentation/preview/configuration/tls.md index 28915df7..9803a72b 100644 --- a/docs/operator-public-documentation/preview/configuration/tls.md +++ b/docs/operator-public-documentation/preview/configuration/tls.md @@ -57,10 +57,10 @@ Select your TLS mode below. Each tab shows prerequisites, the complete YAML conf mode: Disabled ``` - Connect without TLS: + **Connect with mongosh:** ```bash - mongosh "mongodb://:@:10260/?directConnection=true" + mongosh "mongodb://:@:10260/?directConnection=true&authMechanism=SCRAM-SHA-256" ``` === "SelfSigned" @@ -70,7 +70,7 @@ Select your TLS mode below. Each tab shows prerequisites, the complete YAML conf !!! note "Prerequisites" [cert-manager](https://cert-manager.io/) must be installed in the cluster. See [Install cert-manager](../index.md#install-cert-manager) for setup instructions. - SelfSigned mode uses cert-manager to automatically generate and manage a self-signed CA and server certificate. No additional configuration is needed beyond setting the mode. + SelfSigned mode uses cert-manager to automatically generate, manage, and rotate a self-signed server certificate (90-day validity, renewed 15 days before expiry). No additional configuration is needed beyond setting the mode. ```yaml title="documentdb-tls-selfsigned.yaml" apiVersion: documentdb.io/preview @@ -89,17 +89,15 @@ Select your TLS mode below. Each tab shows prerequisites, the complete YAML conf mode: SelfSigned ``` - The operator automatically creates a self-signed CA, generates a server certificate, and mounts it in the gateway pod. - - Connect with TLS using the CA certificate: + **Connect with mongosh:** ```bash - # Extract the CA certificate - kubectl get secret documentdb-gateway-cert-tls -n default \ + # Extract the CA certificate from the Secret + kubectl get secret my-documentdb-gateway-cert-tls -n default \ -o jsonpath='{.data.ca\.crt}' | base64 -d > ca.crt - # Connect with mongosh - mongosh "mongodb://:@:10260/?tls=true&directConnection=true" \ + # Connect with TLS + mongosh "mongodb://:@:10260/?directConnection=true&authMechanism=SCRAM-SHA-256" \ --tls --tlsCAFile ca.crt ``` @@ -110,6 +108,43 @@ Select your TLS mode below. Each tab shows prerequisites, the complete YAML conf !!! note "Prerequisites" [cert-manager](https://cert-manager.io/) must be installed (see [Install cert-manager](../index.md#install-cert-manager)), plus a configured [Issuer or ClusterIssuer](https://cert-manager.io/docs/concepts/issuer/). + ??? example "Setting up a CA Issuer with cert-manager" + + If you don't already have an Issuer, you can bootstrap a simple CA Issuer: + + ```yaml title="cert-manager-ca-issuer.yaml" + # Step 1: A self-signed issuer to bootstrap the CA certificate + apiVersion: cert-manager.io/v1 + kind: Issuer + metadata: + name: selfsigned-bootstrap + spec: + selfSigned: {} + --- + # Step 2: A CA certificate issued by the bootstrap issuer + apiVersion: cert-manager.io/v1 + kind: Certificate + metadata: + name: my-ca + spec: + isCA: true + commonName: my-documentdb-ca + secretName: my-ca-secret + duration: 8760h # 1 year + issuerRef: + name: selfsigned-bootstrap + kind: Issuer + --- + # Step 3: A CA issuer that signs certificates using the CA certificate + apiVersion: cert-manager.io/v1 + kind: Issuer + metadata: + name: my-ca-issuer + spec: + ca: + secretName: my-ca-secret + ``` + CertManager mode lets you use your own cert-manager [Issuer](https://cert-manager.io/docs/concepts/issuer/#namespaces) (namespace-scoped) or [ClusterIssuer](https://cert-manager.io/docs/concepts/issuer/) (cluster-scoped) to issue TLS certificates for the DocumentDB gateway. This is ideal for production environments that already have PKI infrastructure (for example, a corporate CA). Set `issuerRef.name` and `issuerRef.kind` to match your Issuer or ClusterIssuer. The operator will automatically request a certificate and mount it in the gateway. @@ -131,40 +166,51 @@ Select your TLS mode below. Each tab shows prerequisites, the complete YAML conf mode: CertManager certManager: issuerRef: - name: letsencrypt-prod # (1)! - kind: ClusterIssuer # (2)! + name: my-ca-issuer # (1)! + kind: Issuer # (2)! dnsNames: # (3)! - documentdb.example.com - - "*.documentdb.example.com" secretName: my-documentdb-tls # (4)! ``` - 1. Must match the `metadata.name` of your Issuer or ClusterIssuer. + 1. Must match the `metadata.name` of your Issuer or ClusterIssuer (e.g., `my-ca-issuer` from the prerequisite example above). 2. Use [`ClusterIssuer`](https://cert-manager.io/docs/concepts/issuer/#cluster-resource) for cluster-scoped issuers, or [`Issuer`](https://cert-manager.io/docs/concepts/issuer/#namespaces) for namespace-scoped. 3. [Subject Alternative Names](https://en.wikipedia.org/wiki/Subject_Alternative_Name) — add all DNS names clients will use to connect. 4. The Kubernetes Secret where cert-manager will store the issued certificate. - For a complete list of CertManager fields, see the [API Reference — TLS Types](../api-reference.md#tlsconfiguration). + For a complete list of CertManager fields, see [CertManagerTLS](../api-reference.md#certmanagertls) in the API Reference. + + **Connect with mongosh:** + + ```bash + # Extract the CA certificate from the Secret + kubectl get secret my-documentdb-tls -n default \ + -o jsonpath='{.data.ca\.crt}' | base64 -d > ca.crt + + # Connect with TLS + mongosh "mongodb://:@:10260/?directConnection=true&authMechanism=SCRAM-SHA-256" \ + --tls --tlsCAFile ca.crt + ``` === "Provided" **Best for:** Production with centralized certificate management !!! note "Prerequisites" - A Kubernetes [TLS Secret](https://kubernetes.io/docs/concepts/configuration/secret/#tls-secrets) containing `tls.crt` and `tls.key` (and optionally `ca.crt`). + A Kubernetes [TLS Secret](https://kubernetes.io/docs/concepts/configuration/secret/#tls-secrets) containing `tls.crt` and `tls.key`. - Provided mode lets you supply your own TLS certificates. This is ideal when certificates are managed externally (for example, from Azure Key Vault, HashiCorp Vault, or a corporate CA). + ??? example "Creating a TLS Secret" - First, create a Kubernetes TLS Secret with your certificates: + ```bash + kubectl create secret generic my-documentdb-tls -n default \ + --from-file=tls.crt=server.crt \ + --from-file=tls.key=server.key \ + --from-file=ca.crt=ca.crt # (1)! + ``` - ```bash title="Create TLS secret" - kubectl create secret generic my-documentdb-tls -n default \ - --from-file=tls.crt=server.crt \ - --from-file=tls.key=server.key \ - --from-file=ca.crt=ca.crt # optional: include if clients need CA verification - ``` + 1. Optional. The gateway only uses `tls.crt` and `tls.key`. Including `ca.crt` stores the CA certificate in the same Secret for easy client-side retrieval. - Then reference the secret in your DocumentDB configuration: + Provided mode lets you supply your own TLS certificates. This is ideal when certificates are managed externally (for example, from Azure Key Vault, HashiCorp Vault, or a corporate CA). ```yaml title="documentdb-tls-provided.yaml" apiVersion: documentdb.io/preview @@ -182,19 +228,26 @@ Select your TLS mode below. Each tab shows prerequisites, the complete YAML conf gateway: mode: Provided provided: - secretName: my-documentdb-tls # (1)! + secretName: my-documentdb-tls ``` - 1. The Secret must contain `tls.crt` (server certificate) and `tls.key` (private key). Optionally include `ca.crt` (CA certificate) if clients need to verify the server. + **Connect with mongosh:** - For Azure Key Vault integration, see the [Manual Provided Mode Setup Guide](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/tls/MANUAL-PROVIDED-MODE-SETUP.md). + ```bash + # Connect with TLS using your CA certificate + mongosh "mongodb://:@:10260/?directConnection=true&authMechanism=SCRAM-SHA-256" \ + --tls --tlsCAFile ca.crt + ``` ## Certificate Rotation -### Automatic Rotation +Certificate rotation is automatic and zero-downtime. When a certificate is renewed, the gateway picks up the new certificate within ~2 minutes without restarting pods. -- **SelfSigned and CertManager modes**: cert-manager automatically rotates certificates before expiration. The operator detects the updated Secret and reloads the gateway. -- **Provided mode**: Update the external Secret (or trigger a CSI driver sync). The operator picks up changes automatically. +| Mode | Rotation | Action required | +|------|----------|-----------------| +| **SelfSigned** | cert-manager auto-renews 15 days before the 90-day expiry | None | +| **CertManager** | cert-manager auto-renews based on the Certificate CR's `renewBefore` | None | +| **Provided** | You update the Secret contents (manually or via CSI driver sync) | Update the Secret | ### Monitoring Certificate Expiration @@ -216,60 +269,14 @@ Example TLS status output: ```json { "ready": true, - "secretName": "documentdb-gateway-cert-tls", - "message": "" + "secretName": "my-documentdb-gateway-cert-tls", + "message": "Gateway TLS certificate ready" } ``` -## Troubleshooting - -### Certificate Not Ready - -**Symptoms**: `tls.ready` is `false`, pods may not start. - -```bash -# Check cert-manager certificate status -kubectl describe certificate -n - -# Check cert-manager logs -kubectl logs -n cert-manager deployment/cert-manager - -# Check for pending CertificateRequests -kubectl get certificaterequest -n -``` - -**Common causes**: - -- cert-manager is not installed or not running -- The Issuer or ClusterIssuer does not exist or is not ready -- DNS validation is failing (for ACME/Let's Encrypt) - -### TLS Connection Failures - -**Symptoms**: Clients cannot connect with TLS enabled. - -```bash -# Test TLS handshake directly -EXTERNAL_IP=$(kubectl get svc -n \ - -o jsonpath='{.items[0].status.loadBalancer.ingress[0].ip}') -openssl s_client -connect $EXTERNAL_IP:10260 - -# Check gateway logs -kubectl logs -n -c gateway -``` - -**Common causes**: - -- Client is not using the correct CA certificate -- Certificate SANs do not match the connection hostname -- The Secret is missing required keys (`tls.crt`, `tls.key`, `ca.crt`) - -### Azure Key Vault Access Denied (Provided Mode) - -**Symptoms**: Secret is not synced from Azure Key Vault. See the [Manual Provided Mode Setup Guide](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/tls/MANUAL-PROVIDED-MODE-SETUP.md) for troubleshooting. - ## Additional Resources -- [API Reference — TLS Types](../api-reference.md#tlsconfiguration) — Full field reference for TLSConfiguration and GatewayTLS -- [TLS Setup Scripts](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/tls/README.md) — Automated setup and E2E testing -- [cert-manager Documentation](https://cert-manager.io/docs/) +The [`documentdb-playground/tls/`](https://github.com/documentdb/documentdb-kubernetes-operator/tree/main/documentdb-playground/tls) directory provides automated scripts and end-to-end guides for TLS setup on AKS: + +- 📖 **[E2E Testing Guide](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/tls/E2E-TESTING.md)** — Automated and manual E2E testing workflows for all TLS modes +- 📘 **[Manual Provided-Mode Setup](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/tls/MANUAL-PROVIDED-MODE-SETUP.md)** — Step-by-step guide for Provided TLS mode with Azure Key Vault From f1af39aa844542dff16ee7e31c1b52a497a22f2a Mon Sep 17 00:00:00 2001 From: Wenting Wu Date: Tue, 10 Mar 2026 23:04:23 -0400 Subject: [PATCH 7/9] docs: improve Signed-off-by: Wenting Wu --- .../preview/configuration/networking.md | 138 +++++++++++++++--- 1 file changed, 120 insertions(+), 18 deletions(-) diff --git a/docs/operator-public-documentation/preview/configuration/networking.md b/docs/operator-public-documentation/preview/configuration/networking.md index 8d42a213..d2014f74 100644 --- a/docs/operator-public-documentation/preview/configuration/networking.md +++ b/docs/operator-public-documentation/preview/configuration/networking.md @@ -33,7 +33,17 @@ For the full field reference, see [ExposeViaService](../api-reference.md#exposev Exposes the service only within the Kubernetes cluster. ```yaml + apiVersion: documentdb.io/preview + kind: DocumentDB + metadata: + name: my-documentdb + namespace: default spec: + nodeCount: 1 + instancesPerNode: 1 + resource: + storage: + pvcSize: 10Gi exposeViaService: serviceType: ClusterIP ``` @@ -55,10 +65,20 @@ For the full field reference, see [ExposeViaService](../api-reference.md#exposev Provisions a cloud load balancer for external access. Set the `environment` field to get cloud-optimized annotations. - === "AKS (Azure)" + === "AKS" ```yaml + apiVersion: documentdb.io/preview + kind: DocumentDB + metadata: + name: my-documentdb + namespace: default spec: + nodeCount: 1 + instancesPerNode: 1 + resource: + storage: + pvcSize: 10Gi environment: aks exposeViaService: serviceType: LoadBalancer @@ -72,17 +92,20 @@ For the full field reference, see [ExposeViaService](../api-reference.md#exposev mongosh "mongodb://:@$EXTERNAL_IP:10260/?directConnection=true" ``` - For an internal load balancer (private VNet only): - - ```bash - kubectl annotate svc documentdb-service-my-documentdb -n default \ - service.beta.kubernetes.io/azure-load-balancer-internal="true" - ``` - - === "EKS (AWS)" + === "EKS" ```yaml + apiVersion: documentdb.io/preview + kind: DocumentDB + metadata: + name: my-documentdb + namespace: default spec: + nodeCount: 1 + instancesPerNode: 1 + resource: + storage: + pvcSize: 10Gi environment: eks exposeViaService: serviceType: LoadBalancer @@ -97,10 +120,20 @@ For the full field reference, see [ExposeViaService](../api-reference.md#exposev mongosh "mongodb://:@$HOSTNAME:10260/?directConnection=true" ``` - === "GKE (Google Cloud)" + === "GKE" ```yaml + apiVersion: documentdb.io/preview + kind: DocumentDB + metadata: + name: my-documentdb + namespace: default spec: + nodeCount: 1 + instancesPerNode: 1 + resource: + storage: + pvcSize: 10Gi environment: gke exposeViaService: serviceType: LoadBalancer @@ -120,16 +153,85 @@ If your cluster uses restrictive [NetworkPolicies](https://kubernetes.io/docs/co | Traffic | From | To | Port | |---------|------|----|------| -| Operator → Database | `documentdb-operator` namespace | DocumentDB pods | 8000, 5432 | | Application → Gateway | Application namespace | DocumentDB pods | 10260 | +| CNPG instance manager | CNPG operator / DocumentDB pods | DocumentDB pods | 8000 | | Database replication | DocumentDB pods | DocumentDB pods | 5432 | -See the [Kubernetes NetworkPolicy documentation](https://kubernetes.io/docs/concepts/services-networking/network-policies/) for examples. +!!! note + The replication rule (port 5432) is only needed when `instancesPerNode > 1`. + +### Example NetworkPolicy Configuration + +If your cluster enforces a default-deny ingress policy, apply the following to allow DocumentDB traffic. + +=== "Gateway Access (port 10260)" + + Allow application traffic to the DocumentDB gateway: + + ```yaml + apiVersion: networking.k8s.io/v1 + kind: NetworkPolicy + metadata: + name: allow-documentdb-gateway + namespace: + spec: + podSelector: + matchLabels: + app: # matches your DocumentDB CR name + policyTypes: + - Ingress + ingress: + - ports: + - protocol: TCP + port: 10260 + ``` -## Troubleshooting +=== "CNPG Instance Manager (port 8000)" + + Allow CNPG operator health checks. **Required** — without this, CNPG cannot manage pod lifecycle. + + ```yaml + apiVersion: networking.k8s.io/v1 + kind: NetworkPolicy + metadata: + name: allow-cnpg-status + namespace: + spec: + podSelector: + matchLabels: + app: + policyTypes: + - Ingress + ingress: + - ports: + - protocol: TCP + port: 8000 + ``` + +=== "Replication (port 5432)" + + Allow pod-to-pod replication traffic. Only needed when `instancesPerNode > 1`. + + ```yaml + apiVersion: networking.k8s.io/v1 + kind: NetworkPolicy + metadata: + name: allow-documentdb-replication + namespace: + spec: + podSelector: + matchLabels: + app: + policyTypes: + - Ingress + ingress: + - from: + - podSelector: + matchLabels: + app: + ports: + - protocol: TCP + port: 5432 + ``` -| Problem | Common Causes | -|---------|---------------| -| **LoadBalancer stuck in Pending** | Cloud provider quota exceeded; missing cloud controller permissions; subnet/security group misconfiguration | -| **Connection timeout to external IP** | Firewall blocking port 10260; pod not ready; service selector mismatch | -| **In-cluster DNS not resolving** | CoreDNS not running; wrong namespace in DNS name; service does not exist | +See the [Kubernetes NetworkPolicy documentation](https://kubernetes.io/docs/concepts/services-networking/network-policies/) for more details. From 4be10fe4564b8de096cc4f8b450e8766477ad5cd Mon Sep 17 00:00:00 2001 From: Wenting Wu Date: Wed, 11 Mar 2026 12:49:00 -0400 Subject: [PATCH 8/9] docs: improve Signed-off-by: Wenting Wu --- .github/agents/documentation-agent.md | 1 + CONTRIBUTING.md | 5 +- .../preview/advanced-configuration/README.md | 10 --- .../preview/configuration/networking.md | 77 +++++++++---------- .../preview/configuration/storage.md | 36 +++++---- .../preview/configuration/tls.md | 26 +++++-- .../preview/index.md | 43 ++++++----- mkdocs.yml | 2 +- 8 files changed, 104 insertions(+), 96 deletions(-) diff --git a/.github/agents/documentation-agent.md b/.github/agents/documentation-agent.md index 2831f303..26226986 100644 --- a/.github/agents/documentation-agent.md +++ b/.github/agents/documentation-agent.md @@ -49,6 +49,7 @@ You are a documentation specialist for the DocumentDB Kubernetes Operator projec - always check for and avoid outdated information in the documentation - always check for and avoid typos and grammatical errors in the documentation - ensure that all documentation is accurate and up-to-date with the latest code changes +- **Never use "cluster" alone** — always qualify as "DocumentDB cluster" or "Kubernetes cluster" to avoid ambiguity ## MkDocs Site diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 2739912c..c282e9a1 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -11,7 +11,4 @@ instructions provided by the bot. You will only need to do this once across all This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) -or contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments. - -For development environment setup—including how to test public documentation locally—see the -[Development Environment Guide](docs/developer-guides/development-environment.md). \ No newline at end of file +or contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments. \ No newline at end of file diff --git a/docs/operator-public-documentation/preview/advanced-configuration/README.md b/docs/operator-public-documentation/preview/advanced-configuration/README.md index 533fde6f..d13568b7 100644 --- a/docs/operator-public-documentation/preview/advanced-configuration/README.md +++ b/docs/operator-public-documentation/preview/advanced-configuration/README.md @@ -2,12 +2,6 @@ This section covers advanced configuration options for the DocumentDB Kubernetes Operator. -For core configuration topics, see the [Configuration](../configuration/tls.md) guides: - -- [API Reference](../api-reference.md) — CRD reference for DocumentDB, Backup, and ScheduledBackup -- [TLS](../configuration/tls.md) — TLS modes, certificate rotation, and troubleshooting -- [Storage](../configuration/storage.md) — Storage classes, PVC sizing, encryption -- [Networking](../configuration/networking.md) — Service types, load balancers, Network Policies ## Table of Contents - [High Availability](#high-availability) @@ -88,8 +82,4 @@ For production, consider using: ## Additional Resources -- [Configuration Guides](../configuration/tls.md) — TLS, Storage, Networking, and Resource Management -- [API Reference](../api-reference.md) — CRD reference for DocumentDB, Backup, and ScheduledBackup -- [TLS Setup Guide](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/tls/README.md) -- [E2E Testing Guide](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/tls/E2E-TESTING.md) - [GitHub Repository](https://github.com/documentdb/documentdb-kubernetes-operator) diff --git a/docs/operator-public-documentation/preview/configuration/networking.md b/docs/operator-public-documentation/preview/configuration/networking.md index d2014f74..53355315 100644 --- a/docs/operator-public-documentation/preview/configuration/networking.md +++ b/docs/operator-public-documentation/preview/configuration/networking.md @@ -11,7 +11,9 @@ tags: ## Overview -DocumentDB exposes connectivity through a Kubernetes Service named `documentdb-service-`. The gateway listens on port **10260** (MongoDB-compatible wire protocol). +Networking controls how clients connect to your DocumentDB cluster. Configure it to choose between internal-only or external access, and to secure traffic with Network Policies. + +The operator creates a Kubernetes [Service](https://kubernetes.io/docs/concepts/services-networking/service/) named `documentdb-service-` to provide a stable endpoint for your applications. Since pod IPs change whenever pods restart, the Service gives you a fixed address that automatically routes traffic to the active primary pod on port **10260**. You can control how this service is exposed by setting the `exposeViaService` field: ```yaml apiVersion: documentdb.io/preview @@ -30,7 +32,7 @@ For the full field reference, see [ExposeViaService](../api-reference.md#exposev === "ClusterIP (Internal)" - Exposes the service only within the Kubernetes cluster. + Use [ClusterIP](https://kubernetes.io/docs/concepts/services-networking/service/#type-clusterip) when your applications run **inside** the same Kubernetes cluster. This is the default and most secure option — the database is not exposed outside the Kubernetes cluster. For local development, use `kubectl port-forward`. ```yaml apiVersion: documentdb.io/preview @@ -48,22 +50,17 @@ For the full field reference, see [ExposeViaService](../api-reference.md#exposev serviceType: ClusterIP ``` - Connect from within the cluster: - - ```bash - mongosh "mongodb://:@documentdb-service-my-documentdb.default.svc.cluster.local:10260/?directConnection=true" - ``` +=== "LoadBalancer (External)" - For local development, use port-forwarding: + Use [LoadBalancer](https://kubernetes.io/docs/concepts/services-networking/service/#loadbalancer) when your applications run **outside** the Kubernetes cluster, or you need a public or cloud-accessible endpoint. The cloud provider provisions an external IP (AKS, GKE) or hostname (EKS). - ```bash - kubectl port-forward svc/documentdb-service-my-documentdb -n default 10260:10260 - mongosh "mongodb://:@localhost:10260/?directConnection=true" - ``` + Setting `environment` is optional but recommended — it adds cloud-specific annotations to the LoadBalancer service: -=== "LoadBalancer (External)" + - **`aks`**: Explicitly marks the load balancer as external (`azure-load-balancer-external: true`) + - **`eks`**: Uses AWS Network Load Balancer (NLB) with cross-zone balancing and IP-based targeting for lower latency + - **`gke`**: Sets the load balancer type to External - Provisions a cloud load balancer for external access. Set the `environment` field to get cloud-optimized annotations. + Without `environment`, a generic LoadBalancer is created that relies on the cloud provider's default behavior. === "AKS" @@ -84,14 +81,6 @@ For the full field reference, see [ExposeViaService](../api-reference.md#exposev serviceType: LoadBalancer ``` - Get the external IP and connect: - - ```bash - EXTERNAL_IP=$(kubectl get svc documentdb-service-my-documentdb -n default \ - -o jsonpath='{.status.loadBalancer.ingress[0].ip}') - mongosh "mongodb://:@$EXTERNAL_IP:10260/?directConnection=true" - ``` - === "EKS" ```yaml @@ -111,15 +100,6 @@ For the full field reference, see [ExposeViaService](../api-reference.md#exposev serviceType: LoadBalancer ``` - !!! note - On EKS, the external endpoint is a hostname rather than an IP. Use it directly in your connection string. - - ```bash - HOSTNAME=$(kubectl get svc documentdb-service-my-documentdb -n default \ - -o jsonpath='{.status.loadBalancer.ingress[0].hostname}') - mongosh "mongodb://:@$HOSTNAME:10260/?directConnection=true" - ``` - === "GKE" ```yaml @@ -139,17 +119,36 @@ For the full field reference, see [ExposeViaService](../api-reference.md#exposev serviceType: LoadBalancer ``` - Get the external IP and connect: +## Connect with mongosh - ```bash - EXTERNAL_IP=$(kubectl get svc documentdb-service-my-documentdb -n default \ - -o jsonpath='{.status.loadBalancer.ingress[0].ip}') - mongosh "mongodb://:@$EXTERNAL_IP:10260/?directConnection=true" - ``` +=== "Connection String" + + Retrieve the connection string from the DocumentDB resource status and connect. This works with both ClusterIP and LoadBalancer service types — the operator automatically populates it with the correct service address. + + The connection string contains embedded `kubectl` commands that resolve your credentials automatically: + + ```bash + CONNECTION_STRING=$(eval echo "$(kubectl get documentdb my-documentdb -n default -o jsonpath='{.status.connectionString}')") + mongosh "$CONNECTION_STRING" + ``` + +=== "Port Forwarding" + + Port forwarding works with any service type and is useful for local development. It connects directly to the pod, bypassing the Kubernetes Service. Run `kubectl port-forward` in one terminal and `mongosh` in a separate terminal, since port forwarding must stay running. + + ```bash + # Terminal 1 — keep this running + kubectl port-forward svc/documentdb-service-my-documentdb -n default 10260:10260 + ``` + + ```bash + # Terminal 2 + mongosh "mongodb://:@localhost:10260/?directConnection=true" + ``` ## Network Policies -If your cluster uses restrictive [NetworkPolicies](https://kubernetes.io/docs/concepts/services-networking/network-policies/), ensure the following traffic is allowed: +If your Kubernetes cluster uses restrictive [NetworkPolicies](https://kubernetes.io/docs/concepts/services-networking/network-policies/), ensure the following traffic is allowed: | Traffic | From | To | Port | |---------|------|----|------| @@ -162,7 +161,7 @@ If your cluster uses restrictive [NetworkPolicies](https://kubernetes.io/docs/co ### Example NetworkPolicy Configuration -If your cluster enforces a default-deny ingress policy, apply the following to allow DocumentDB traffic. +If your Kubernetes cluster enforces a default-deny ingress policy, apply the following to allow DocumentDB traffic. === "Gateway Access (port 10260)" diff --git a/docs/operator-public-documentation/preview/configuration/storage.md b/docs/operator-public-documentation/preview/configuration/storage.md index 0193a022..7377130c 100644 --- a/docs/operator-public-documentation/preview/configuration/storage.md +++ b/docs/operator-public-documentation/preview/configuration/storage.md @@ -11,7 +11,9 @@ tags: ## Overview -DocumentDB uses Kubernetes [PersistentVolumeClaims (PVCs)](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#persistentvolumeclaims) to request storage, which are backed by [PersistentVolumes (PVs)](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) provisioned by your cloud provider. +Storage controls how DocumentDB persists data — including disk size, storage type, retention behavior, and encryption. + +Each DocumentDB instance stores its data on a Kubernetes [PersistentVolume (PV)](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) provisioned through a [PersistentVolumeClaim (PVC)](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#persistentvolumeclaims). You need to specify at least the disk size; optionally, you can choose a storage class for your cloud provider and control what happens to the data when the DocumentDB cluster is deleted. Configure storage through the `spec.resource.storage` field: ```yaml apiVersion: documentdb.io/preview @@ -28,22 +30,24 @@ spec: For the full field reference, see [StorageConfiguration](../api-reference.md#storageconfiguration) in the API Reference. -## PVC Sizing +## Disk Size (`pvcSize`) + +The `pvcSize` field sets how much disk space each DocumentDB instance gets. This is set at DocumentDB cluster creation time. Online resizing is **coming soon** — see [#298](https://github.com/documentdb/documentdb-kubernetes-operator/issues/298) for tracking. -PVC size is set at cluster creation time. Online PVC resizing is **coming soon** — see [#298](https://github.com/documentdb/documentdb-kubernetes-operator/issues/298) for tracking. +## Reclaim Policy (`persistentVolumeReclaimPolicy`) -## Reclaim Policy +The `persistentVolumeReclaimPolicy` field controls what happens to your data when a DocumentDB cluster is deleted: | Policy | Behavior | |--------|----------| -| `Retain` (default) | PV is preserved after PVC deletion. **Recommended for production.** | -| `Delete` | PV and underlying storage are deleted with the PVC. Suitable for development. | +| `Retain` (default) | Data is preserved after DocumentDB deletion. **Recommended for production.** | +| `Delete` | Data is permanently deleted with the DocumentDB cluster. Suitable for development. | -With `Retain`, you can recover data from a retained PV after cluster deletion. See [PersistentVolume Retention and Recovery](../backup-and-restore.md#persistentvolume-retention-and-recovery) for restore steps. +With `Retain`, you can recover data even after the DocumentDB cluster is gone. See [PersistentVolume Retention and Recovery](../backup-and-restore.md#persistentvolume-retention-and-recovery) for restore steps. -## Storage Classes +## Storage Classes (`storageClass`) -A [StorageClass](https://kubernetes.io/docs/concepts/storage/storage-classes/) defines the type of underlying disk (e.g., SSD vs HDD) and provisioner used for persistent volumes. If you don't specify one, Kubernetes uses the default StorageClass in your cluster. +The `storageClass` field selects which type of underlying disk (e.g., SSD vs HDD) to provision. See [Kubernetes StorageClass](https://kubernetes.io/docs/concepts/storage/storage-classes/) for details. If you don't specify one, Kubernetes uses the default StorageClass in your Kubernetes cluster. To see available StorageClasses and which one is the default: @@ -55,6 +59,8 @@ The default is marked with `(default)` in the output. ## Disk Encryption +Disk encryption protects your data at rest — if someone gains physical access to the underlying storage, the data is unreadable without the encryption key. Most cloud providers enable this by default, but EKS requires explicit configuration. + | Provider | Default Encryption | Customer-Managed Keys | |----------|-------------------|----------------------| | **AKS** | ✅ Enabled (platform-managed keys) | [Azure Disk Encryption with CMK](https://learn.microsoft.com/azure/aks/azure-disk-customer-managed-keys) | @@ -81,10 +87,10 @@ The default is marked with `(default)` in the output. ## PersistentVolume Security -The operator automatically applies security-hardening mount options to all PersistentVolumes associated with DocumentDB clusters: +As a defense-in-depth measure, the operator automatically applies security-hardening mount options to all DocumentDB volumes. These prevent common attack vectors even if a container is compromised: -| Mount Option | Description | -|--------------|-------------| -| `nodev` | Prevents device files from being interpreted on the filesystem | -| `nosuid` | Prevents setuid/setgid bits from taking effect | -| `noexec` | Prevents execution of binaries on the filesystem | +| Mount Option | What it prevents | +|--------------|------------------| +| `nodev` | Blocks creation of device files that could access host hardware | +| `nosuid` | Blocks privilege escalation via setuid/setgid binaries | +| `noexec` | Blocks execution of malicious binaries written to the data volume | diff --git a/docs/operator-public-documentation/preview/configuration/tls.md b/docs/operator-public-documentation/preview/configuration/tls.md index 9803a72b..d660c33d 100644 --- a/docs/operator-public-documentation/preview/configuration/tls.md +++ b/docs/operator-public-documentation/preview/configuration/tls.md @@ -11,7 +11,9 @@ tags: ## Overview -The DocumentDB operator supports TLS encryption for gateway connections. TLS protects data in transit between clients and the DocumentDB gateway. +TLS encrypts connections between your applications and DocumentDB. Configure it to protect data in transit and meet your security requirements. + +The DocumentDB gateway always encrypts connections — TLS is active regardless of the mode you choose. The `spec.tls.gateway.mode` field controls how the operator manages TLS certificates: ```yaml apiVersion: documentdb.io/preview @@ -21,7 +23,7 @@ metadata: spec: tls: gateway: - mode: SelfSigned # Disabled (default) | SelfSigned | CertManager | Provided + mode: SelfSigned # Disabled | SelfSigned | CertManager | Provided ``` For the full field reference, see [TLSConfiguration](../api-reference.md#tlsconfiguration) in the API Reference. @@ -30,7 +32,7 @@ For the full field reference, see [TLSConfiguration](../api-reference.md#tlsconf Select your TLS mode below. Each tab shows prerequisites, the complete YAML configuration, and connection instructions. -=== "Disabled (default)" +=== "Disabled" **Best for:** Development and testing only @@ -38,7 +40,7 @@ Select your TLS mode below. Each tab shows prerequisites, the complete YAML conf **Prerequisites:** None - Disabled mode runs the gateway without TLS encryption. All traffic between clients and the gateway is unencrypted. + Disabled mode means the operator does not manage TLS certificates. However, the gateway still encrypts all connections using an internally generated self-signed certificate. Clients must connect with `tls=true&tlsAllowInvalidCertificates=true`. ```yaml title="documentdb-tls-disabled.yaml" apiVersion: documentdb.io/preview @@ -52,6 +54,8 @@ Select your TLS mode below. Each tab shows prerequisites, the complete YAML conf resource: storage: pvcSize: 10Gi + exposeViaService: + serviceType: ClusterIP tls: gateway: mode: Disabled @@ -60,7 +64,7 @@ Select your TLS mode below. Each tab shows prerequisites, the complete YAML conf **Connect with mongosh:** ```bash - mongosh "mongodb://:@:10260/?directConnection=true&authMechanism=SCRAM-SHA-256" + mongosh "mongodb://:@:10260/?directConnection=true&authMechanism=SCRAM-SHA-256&tls=true&tlsAllowInvalidCertificates=true" ``` === "SelfSigned" @@ -68,7 +72,7 @@ Select your TLS mode below. Each tab shows prerequisites, the complete YAML conf **Best for:** Development, testing, and environments without external PKI (Public Key Infrastructure) !!! note "Prerequisites" - [cert-manager](https://cert-manager.io/) must be installed in the cluster. See [Install cert-manager](../index.md#install-cert-manager) for setup instructions. + [cert-manager](https://cert-manager.io/) must be installed in the Kubernetes cluster. See [Install cert-manager](../index.md#install-cert-manager) for setup instructions. SelfSigned mode uses cert-manager to automatically generate, manage, and rotate a self-signed server certificate (90-day validity, renewed 15 days before expiry). No additional configuration is needed beyond setting the mode. @@ -84,6 +88,8 @@ Select your TLS mode below. Each tab shows prerequisites, the complete YAML conf resource: storage: pvcSize: 10Gi + exposeViaService: + serviceType: ClusterIP tls: gateway: mode: SelfSigned @@ -161,6 +167,8 @@ Select your TLS mode below. Each tab shows prerequisites, the complete YAML conf resource: storage: pvcSize: 100Gi + exposeViaService: + serviceType: ClusterIP tls: gateway: mode: CertManager @@ -176,7 +184,7 @@ Select your TLS mode below. Each tab shows prerequisites, the complete YAML conf 1. Must match the `metadata.name` of your Issuer or ClusterIssuer (e.g., `my-ca-issuer` from the prerequisite example above). 2. Use [`ClusterIssuer`](https://cert-manager.io/docs/concepts/issuer/#cluster-resource) for cluster-scoped issuers, or [`Issuer`](https://cert-manager.io/docs/concepts/issuer/#namespaces) for namespace-scoped. 3. [Subject Alternative Names](https://en.wikipedia.org/wiki/Subject_Alternative_Name) — add all DNS names clients will use to connect. - 4. The Kubernetes Secret where cert-manager will store the issued certificate. + 4. Optional. The Kubernetes Secret where cert-manager stores the issued certificate — you do not need to create this Secret yourself, cert-manager generates it automatically. Defaults to `-gateway-cert-tls` if not specified. For a complete list of CertManager fields, see [CertManagerTLS](../api-reference.md#certmanagertls) in the API Reference. @@ -224,6 +232,8 @@ Select your TLS mode below. Each tab shows prerequisites, the complete YAML conf resource: storage: pvcSize: 100Gi + exposeViaService: + serviceType: ClusterIP tls: gateway: mode: Provided @@ -241,7 +251,7 @@ Select your TLS mode below. Each tab shows prerequisites, the complete YAML conf ## Certificate Rotation -Certificate rotation is automatic and zero-downtime. When a certificate is renewed, the gateway picks up the new certificate within ~2 minutes without restarting pods. +Certificate rotation is automatic and zero-downtime. When a certificate is renewed, the gateway picks up the new certificate without restarting pods. | Mode | Rotation | Action required | |------|----------|-----------------| diff --git a/docs/operator-public-documentation/preview/index.md b/docs/operator-public-documentation/preview/index.md index 6461cc66..30b7238c 100644 --- a/docs/operator-public-documentation/preview/index.md +++ b/docs/operator-public-documentation/preview/index.md @@ -184,7 +184,7 @@ documentdb-preview Cluster in healthy state mongodb://... ### Connect to the DocumentDB cluster -Choose a connection method based on your service type. +Choose a connection method based on your service type. For more details on service types, load balancers, and Network Policies, see [Networking](configuration/networking.md). For TLS certificate configuration, see [TLS](configuration/tls.md). #### Option 1: ClusterIP service (default — for local development) @@ -374,33 +374,38 @@ For details, see [Sidecar Injector Plugin Configuration](https://github.com/docu ### Local high-availability (HA) -Deploy multiple DocumentDB instances with automatic failover by setting `instancesPerNode` to a value greater than 1 (up to 3). This creates one primary instance and two replicas for read scalability and automatic failover. +Deploy multiple DocumentDB instances with automatic failover by setting `instancesPerNode` to a value greater than 1. -```yaml +#### Enable local HA + +```bash +cat < spec: - instancesPerNode: 3 # 1 primary + 2 replicas + nodeCount: 1 + instancesPerNode: 3 + documentDbCredentialSecret: documentdb-credentials + resource: + storage: + pvcSize: 10Gi + exposeViaService: + serviceType: LoadBalancer +EOF ``` -For the full field reference, see the [API Reference](api-reference.md#documentdbspec). +This configuration creates: + +- **1 primary instance** — handles all write operations +- **2 replica instances** — provide read scalability and automatic failover ### Multi-cloud deployment The operator supports deployment across multiple cloud environments and Kubernetes distributions. For guidance, see the [Multi-Cloud Deployment Guide](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/multi-cloud-deployment/README.md). -### TLS - -The operator supports four TLS modes: Disabled, SelfSigned, CertManager, and Provided. See [TLS Configuration](configuration/tls.md) for setup instructions, certificate rotation, and troubleshooting. - -For automated TLS testing scripts, see the [TLS Playground](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/tls/README.md). - - -### Further reading - -- [API Reference](api-reference.md) — Auto-generated CRD type reference -- [Backup and Restore](backup-and-restore.md) — On-demand and scheduled backups -- [kubectl Plugin](kubectl-plugin.md) — CLI tooling for day-two operations - - ## Clean up ### Delete the DocumentDB cluster diff --git a/mkdocs.yml b/mkdocs.yml index e9d13815..1bc4e919 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -29,9 +29,9 @@ nav: - Preview: - Get Started: preview/index.md - Configuration: + - Networking: preview/configuration/networking.md - TLS: preview/configuration/tls.md - Storage: preview/configuration/storage.md - - Networking: preview/configuration/networking.md - Advanced Configuration: preview/advanced-configuration/README.md - Backup and Restore: preview/backup-and-restore.md - API Reference: preview/api-reference.md From 3875aaff90c33049ca73680d8500db651f7ed0c8 Mon Sep 17 00:00:00 2001 From: Wenting Wu Date: Wed, 11 Mar 2026 12:53:51 -0400 Subject: [PATCH 9/9] docs: add private CA tlsCAFile guidance to TLS doc Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Wenting Wu --- .../preview/configuration/tls.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/docs/operator-public-documentation/preview/configuration/tls.md b/docs/operator-public-documentation/preview/configuration/tls.md index d660c33d..0c634094 100644 --- a/docs/operator-public-documentation/preview/configuration/tls.md +++ b/docs/operator-public-documentation/preview/configuration/tls.md @@ -190,6 +190,8 @@ Select your TLS mode below. Each tab shows prerequisites, the complete YAML conf **Connect with mongosh:** + If your CA is private (which most cert-manager setups are), you need `--tlsCAFile` so mongosh can verify the server certificate: + ```bash # Extract the CA certificate from the Secret kubectl get secret my-documentdb-tls -n default \ @@ -243,6 +245,8 @@ Select your TLS mode below. Each tab shows prerequisites, the complete YAML conf **Connect with mongosh:** + If your CA is private, you need `--tlsCAFile` so mongosh can verify the server certificate: + ```bash # Connect with TLS using your CA certificate mongosh "mongodb://:@:10260/?directConnection=true&authMechanism=SCRAM-SHA-256" \