-
Notifications
You must be signed in to change notification settings - Fork 18
docs: Add high availability documentation for local HA configuration #294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,289 @@ | ||||||||||||||||
| --- | ||||||||||||||||
| title: Local High Availability | ||||||||||||||||
| description: Configure local high availability for DocumentDB with multiple instances, pod anti-affinity, and automatic failover. | ||||||||||||||||
| tags: | ||||||||||||||||
| - high-availability | ||||||||||||||||
| - configuration | ||||||||||||||||
| - failover | ||||||||||||||||
| --- | ||||||||||||||||
|
|
||||||||||||||||
| # Local High Availability | ||||||||||||||||
|
|
||||||||||||||||
| Local high availability (HA) deploys multiple DocumentDB instances within a single Kubernetes cluster, providing automatic failover and zero data loss during instance failures. | ||||||||||||||||
|
|
||||||||||||||||
| ## Overview | ||||||||||||||||
|
|
||||||||||||||||
| Local HA uses synchronous replication between a primary instance and one or two replicas. When the primary fails, a replica is automatically promoted to primary. | ||||||||||||||||
|
|
||||||||||||||||
|
Comment on lines
+12
to
+17
|
||||||||||||||||
| ```mermaid | ||||||||||||||||
| flowchart LR | ||||||||||||||||
| subgraph zone1[Zone A] | ||||||||||||||||
| P[Primary] | ||||||||||||||||
| end | ||||||||||||||||
| subgraph zone2[Zone B] | ||||||||||||||||
| R1[Replica 1] | ||||||||||||||||
| end | ||||||||||||||||
| subgraph zone3[Zone C] | ||||||||||||||||
| R2[Replica 2] | ||||||||||||||||
| end | ||||||||||||||||
|
|
||||||||||||||||
| App([Application]) --> P | ||||||||||||||||
| P -->|Sync Replication| R1 | ||||||||||||||||
| P -->|Sync Replication| R2 | ||||||||||||||||
| ``` | ||||||||||||||||
|
|
||||||||||||||||
| ## Instance Configuration | ||||||||||||||||
|
|
||||||||||||||||
| Configure the number of instances using the `instancesPerNode` field: | ||||||||||||||||
|
|
||||||||||||||||
| ```yaml title="documentdb-ha.yaml" | ||||||||||||||||
| apiVersion: documentdb.io/preview | ||||||||||||||||
| kind: DocumentDB | ||||||||||||||||
| metadata: | ||||||||||||||||
| name: my-documentdb | ||||||||||||||||
| namespace: documentdb | ||||||||||||||||
| spec: | ||||||||||||||||
| instancesPerNode: 3 # (1)! | ||||||||||||||||
| storage: | ||||||||||||||||
| size: 10Gi | ||||||||||||||||
| storageClassName: managed-csi | ||||||||||||||||
|
Comment on lines
+47
to
+49
|
||||||||||||||||
| storage: | |
| size: 10Gi | |
| storageClassName: managed-csi | |
| resource: | |
| storage: | |
| pvcSize: 10Gi | |
| storageClass: managed-csi |
Copilot
AI
Mar 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This zone anti-affinity example omits required spec fields (notably spec.resource.storage.pvcSize, and any other required fields). If it’s intended as a patch snippet, it should say so explicitly; otherwise include a complete minimal DocumentDB spec so users can apply it without validation errors.
Copilot
AI
Mar 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This node anti-affinity example also omits required spec fields (e.g., spec.resource.storage.pvcSize). Like the zone example, either present it as a partial snippet to merge into an existing manifest, or include the required fields to make it directly runnable.
Copilot
AI
Mar 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "Zero Data Loss" note assumes synchronous replication, but the operator doesn’t configure synchronous replication for single-cluster deployments in the generated CNPG Cluster spec. This should be updated to avoid overstating durability guarantees (or updated to describe the exact configuration that enforces synchronous/quorum writes, if applicable).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we monitor? How do we know there was a failover?
Copilot
AI
Mar 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
documentdb.io/cluster is used by the PV controller for PV/PVC labeling, but CNPG pods are labeled with cnpg.io/cluster. Using -l documentdb.io/cluster=my-documentdb is unlikely to match the instance pods; consider switching this selector to -l cnpg.io/cluster=my-documentdb (CNPG cluster name defaults to the DocumentDB name for single-cluster deployments).
| kubectl get pods -n documentdb -l documentdb.io/cluster=my-documentdb \ | |
| kubectl get pods -n documentdb -l cnpg.io/cluster=my-documentdb \ |
Copilot
AI
Mar 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The operator Helm chart’s pod template labels the operator pods with app: <release-name> (it does not include app.kubernetes.io/name on the pod template). This kubectl logs selector likely won’t match any pods. Prefer kubectl logs deployment/documentdb-operator -n <ns> or a selector that matches the actual pod labels (e.g., -l app=<helm-release-name>).
| kubectl logs -n documentdb-operator -l app.kubernetes.io/name=documentdb-operator --tail=100 | |
| kubectl logs deployment/documentdb-operator -n documentdb-operator --tail=100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider link to Abhishek's blog post
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also reference CNPG high avaliability information