diff --git a/documentdb-playground/k3s-azure-fleet/.gitignore b/documentdb-playground/k3s-azure-fleet/.gitignore new file mode 100644 index 00000000..d84a23e9 --- /dev/null +++ b/documentdb-playground/k3s-azure-fleet/.gitignore @@ -0,0 +1,10 @@ +# Generated files +.deployment-info +.istio-certs/ +istio-*/ +*.tgz +*.log + +# SSH key (required by Azure but not used - we use Run Command) +.ssh-key +.ssh-key.pub diff --git a/documentdb-playground/k3s-azure-fleet/README.md b/documentdb-playground/k3s-azure-fleet/README.md new file mode 100644 index 00000000..4625bc1f --- /dev/null +++ b/documentdb-playground/k3s-azure-fleet/README.md @@ -0,0 +1,524 @@ +# k3s on Azure VMs with KubeFleet and Istio Multi-Cluster Management + +This playground demonstrates deploying DocumentDB on **k3s clusters running on Azure VMs**, integrated with **KubeFleet** for cluster membership and **Istio** for cross-cluster networking. This hybrid architecture showcases: + +- **Lightweight Kubernetes**: k3s on Azure VMs for edge/resource-constrained scenarios +- **Cluster Membership**: KubeFleet hub for fleet-wide resource propagation (e.g., DocumentDB CRDs) +- **Istio Service Mesh**: Cross-cluster networking without complex VNet peering +- **Multi-Region**: AKS + k3s clusters across multiple Azure regions +- **DocumentDB**: Multi-region database deployment with Istio-based replication + +## Architecture + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ Istio Service Mesh (mesh1) │ +├─────────────────────────────────────────────────────────────────────────────┤ +│ │ +│ ┌────────────────────┐ ┌────────────────────┐ │ +│ │ AKS Hub Cluster │ │ k3s Cluster │ │ +│ │ (westus3) │ │ (eastus2) │ │ +│ │ │ │ │ │ +│ │ ┌──────────────┐ │ │ ┌──────────────┐ │ │ +│ │ │ KubeFleet │ │ │ │ Fleet Member │ │ │ +│ │ │ Hub Agent │ │ │ │ Agent │ │ │ +│ │ └──────────────┘ │ │ └──────────────┘ │ │ +│ │ ┌──────────────┐ │ │ ┌──────────────┐ │ │ +│ │ │ Istio ─┼──┼─────────┼──┼─ Istio │ │ │ +│ │ │ East-West GW │ │ │ │ East-West GW │ │ │ +│ │ └──────────────┘ │ │ └──────────────┘ │ │ +│ │ ┌──────────────┐ │ │ ┌──────────────┐ │ │ +│ │ │ DocumentDB │ │◄───────►│ │ DocumentDB │ │ │ +│ │ │ (Primary) │ │ Istio │ │ (Replica) │ │ │ +│ │ └──────────────┘ │ │ └──────────────┘ │ │ +│ └────────────────────┘ └────────────────────┘ │ +│ │ │ │ +│ │ Remote Secrets │ │ +│ └──────────────┬───────────────┘ │ +│ │ │ +│ ┌────────────────────────┴─────────────────────────┐ │ +│ │ k3s Cluster (uksouth) │ │ +│ │ ┌──────────────┐ ┌──────────────┐ │ │ +│ │ │ Fleet Member │ │ Istio │ │ │ +│ │ │ Agent │ │ East-West GW │ │ │ +│ │ └──────────────┘ └──────────────┘ │ │ +│ │ ┌──────────────────────────────────┐ │ │ +│ │ │ DocumentDB (Replica) │ │ │ +│ │ └──────────────────────────────────┘ │ │ +│ └──────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +## Networking Design + +- **Istio Service Mesh** handles all cross-cluster communication +- **East-West Gateways** expose services between clusters via LoadBalancer +- **Remote Secrets** enable service discovery across cluster boundaries +- **No VNet Peering Required** - Istio routes traffic over public LoadBalancers with mTLS +- **Shared Root CA** ensures all clusters trust each other + +## Network Requirements + +> **Important**: The deployment creates NSGs (Network Security Groups) for both AKS and k3s subnets to prevent Azure NRMS from auto-creating restrictive rules. The k3s VMs require the following network access: +> +> | Port | Protocol | Direction | Purpose | +> |------|----------|-----------|---------| +> | 6443 | TCP | Inbound | Kubernetes API server (kubectl access) | +> | 15443 | TCP | Inbound | Istio east-west gateway (cross-cluster mTLS) | +> | 15012 | TCP | Inbound | Istio xDS secure gRPC (cross-cluster discovery) | +> | 15017 | TCP | Inbound | Istio webhook (sidecar injection) | +> | 15021 | TCP | Inbound | Istio health/status | +> | 15010 | TCP | Inbound | Istio xDS plaintext gRPC | +> | 80, 443 | TCP | Inbound | HTTP/HTTPS traffic | +> +> **Corporate Environment Considerations**: +> - This playground uses **Azure VM Run Command** for VM operations (no SSH/port 22 needed) +> - However, **kubectl access to k3s clusters** requires port 6443 to be reachable from your client +> - Corporate firewalls may block port 6443 even when NSG rules allow it +> - **If you cannot reach k3s API**: Use Azure VPN Gateway or deploy from within the Azure network +> - The AKS hub cluster uses Azure AD authentication and works through corporate firewalls + +## Prerequisites + +- Azure CLI installed and logged in (`az login`) +- Sufficient quota in target regions for VMs and AKS clusters +- Contributor access to the subscription +- kubelogin for Azure AD authentication: `az aks install-cli` +- Helm 3.x installed +- jq for JSON processing +- istioctl (auto-downloaded if not present) +- **Network access to port 6443 on k3s VM public IPs** (see Network Requirements) + +## Quick Start + +```bash +# Set your resource group (optional, defaults to documentdb-k3s-fleet-rg) +export RESOURCE_GROUP=my-documentdb-fleet + +# 1. Deploy all infrastructure (AKS hub, k3s VMs) +./deploy-infrastructure.sh + +# 2. Install Istio service mesh across all clusters +./install-istio.sh + +# 3. Setup KubeFleet hub and join all members +./setup-fleet.sh + +# 4. Install cert-manager across all clusters +./install-cert-manager.sh + +# 5. Install DocumentDB operator on all clusters +./install-documentdb-operator.sh + +# 6. Deploy multi-region DocumentDB +./deploy-documentdb.sh + +# Test connection +./test-connection.sh +``` + +## Deployment Scripts + +### 1. `deploy-infrastructure.sh` + +Deploys Azure infrastructure: +- AKS hub cluster in westus3 (also serves as a member) +- Azure VMs with k3s in eastus2 and uksouth +- Each cluster in its own VNet (no peering required - Istio handles connectivity) +- NSGs on all subnets (prevents Azure NRMS auto-creation of restrictive rules) +- **Istio CA certificates**: Pre-generated locally via openssl (zero cluster dependency) and injected into k3s VMs via cloud-init `write_files` +- **Istio remote secrets**: Auto-generated on k3s VMs via cloud-init `runcmd` (creates service account, extracts token, builds remote-secret YAML) + +```bash +# With defaults +./deploy-infrastructure.sh + +# With custom resource group +RESOURCE_GROUP=my-rg ./deploy-infrastructure.sh + +# With custom regions +export K3S_REGIONS_CSV="eastus2,uksouth,northeurope" +./deploy-infrastructure.sh +``` + +### 2. `install-istio.sh` + +Installs Istio service mesh on all clusters: +- **Shared root CA**: Pre-generated during `deploy-infrastructure.sh` and injected into k3s VMs via cloud-init (zero cluster dependency) +- **AKS hub**: installs via `istioctl` (standard approach) +- **k3s VMs**: installs entirely via **Helm** (`istio-base` + `istiod` + `istio/gateway`) with `--skip-schema-validation` to avoid ownership conflicts with `istioctl` +- **Remote secrets**: Pre-generated on k3s VMs via cloud-init, then distributed to other clusters +- Patches k3s east-west gateways with VM public IPs (k3s `servicelb` only assigns internal IPs) + +```bash +./install-istio.sh +``` + +### 3. `setup-fleet.sh` + +Sets up KubeFleet for multi-cluster management: +- Installs KubeFleet hub-agent on the hub cluster +- Joins all clusters (AKS and k3s) as fleet members +- **Known issue**: `joinMC.sh` has a context-switching bug; if a member fails to join, see Troubleshooting +- Fleet is used for cluster membership; Istio handles data traffic + +```bash +./setup-fleet.sh +``` + +### 4. `install-cert-manager.sh` + +Installs cert-manager on all clusters: +- Applies CRDs explicitly before Helm install (avoids silent failures) +- Installs via Helm with `startupapicheck.enabled=false` (avoids timeouts on k3s) +- Applies ClusterResourcePlacement for future cluster propagation + +```bash +./install-cert-manager.sh +``` + +### 5. `install-documentdb-operator.sh` + +Deploys DocumentDB operator on all clusters: +- Installs the operator from the published Helm chart on the AKS hub +- Installs CNPG from upstream release + DocumentDB manifests on k3s via Run Command +- Verifies deployment across all clusters + +```bash +# Default: install from published chart +./install-documentdb-operator.sh + +# Build from local source (for development) +BUILD_CHART=true ./install-documentdb-operator.sh + +# With custom values file +VALUES_FILE=custom-values.yaml ./install-documentdb-operator.sh +``` + +### 6. `deploy-documentdb.sh` + +Deploys multi-region DocumentDB with Istio networking: +- Creates namespace with istio-injection label +- Deploys DocumentDB with crossCloudNetworkingStrategy: Istio +- Configures primary and replicas across all regions + +```bash +# With auto-generated password +./deploy-documentdb.sh + +# With custom password +./deploy-documentdb.sh "MySecurePassword123!" +``` + +## Configuration + +### Default Settings + +| Setting | Default | Description | +|---------|---------|-------------| +| `RESOURCE_GROUP` | `documentdb-k3s-fleet-rg` | Azure resource group | +| `HUB_REGION` | `westus3` | KubeFleet hub region (AKS) | +| `K3S_REGIONS` | `eastus2,uksouth` | k3s VM regions | +| `VM_SIZE` | `Standard_D2s_v3` | Azure VM size for k3s | +| `AKS_VM_SIZE` | `Standard_DS2_v2` | AKS node VM size | +| `K3S_VERSION` | `v1.30.4+k3s1` | k3s version | +| `ISTIO_VERSION` | `1.24.0` | Istio version | + +### Network Configuration (Istio) + +Each cluster has its own isolated VNet - Istio east-west gateways handle all cross-cluster traffic: + +| Cluster | Region | Network ID | VNet CIDR | +|---------|--------|------------|-----------| +| hub-westus3 (AKS) | westus3 | network1 | 10.1.0.0/16 | +| k3s-eastus2 | eastus2 | network2 | 10.2.0.0/16 | +| k3s-uksouth | uksouth | network3 | 10.3.0.0/16 | + +## kubectl Aliases + +After deployment, these aliases are configured in `~/.bashrc`: + +```bash +source ~/.bashrc + +# AKS hub cluster +k-westus3 get nodes +k-hub get nodes + +# k3s clusters +k-eastus2 get nodes +k-uksouth get nodes +``` + +## Istio Management + +```bash +# Check Istio installation on each cluster +for ctx in hub-westus3 k3s-eastus2 k3s-uksouth; do + echo "=== $ctx ===" + kubectl --context $ctx get pods -n istio-system +done + +# Check east-west gateway services +k-hub get svc -n istio-system istio-eastwestgateway + +# Verify remote secrets (for service discovery) +k-hub get secrets -n istio-system -l istio/multiCluster=true +``` + +## Fleet Management + +```bash +# List all member clusters +k-hub get membercluster + +# Check ClusterResourcePlacement status +k-hub get clusterresourceplacement + +# View fleet hub agent logs +k-hub logs -n fleet-system-hub -l app=hub-agent + +# Check member agent on k3s cluster +k-uksouth logs -n fleet-system -l app=member-agent +``` + +## DocumentDB Management + +### Check Status + +```bash +# Check operator on all clusters +for ctx in hub-westus3 k3s-eastus2 k3s-uksouth; do + echo "=== $ctx ===" + kubectl --context $ctx get pods -n documentdb-operator +done + +# Check DocumentDB instances +for ctx in hub-westus3 k3s-eastus2 k3s-uksouth; do + echo "=== $ctx ===" + kubectl --context $ctx get documentdb -n documentdb-preview-ns +done +``` + +### Connect to Database + +```bash +# Port forward to primary +k-westus3 port-forward -n documentdb-preview-ns svc/documentdb-preview 10260:10260 + +# Connection string +mongodb://default_user:@localhost:10260/?directConnection=true&authMechanism=SCRAM-SHA-256&tls=true&tlsAllowInvalidCertificates=true +``` + +### Failover + +```bash +# Failover to k3s cluster in UK South +k-hub patch documentdb documentdb-preview -n documentdb-preview-ns \ + --type='merge' -p '{"spec":{"clusterReplication":{"primary":"k3s-uksouth"}}}' +``` + +## Use Cases + +### Edge Computing +k3s on Azure VMs simulates edge locations where full AKS might be too heavy. DocumentDB replication ensures data availability at the edge while maintaining consistency with central clusters. + +### Hybrid Cloud +Mix AKS managed clusters with self-managed k3s for: +- Cost optimization (k3s on cheaper VMs) +- Specific compliance requirements +- Testing/development environments + +### Disaster Recovery +Multi-region deployment with automatic failover capabilities: +- Primary in AKS (production-grade) +- Replicas in k3s (cost-effective DR) + +## Troubleshooting + +### k3s VM Issues + +```bash +# Check k3s status via Run Command (no SSH needed) +az vm run-command invoke \ + --resource-group $RESOURCE_GROUP \ + --name k3s-uksouth \ + --command-id RunShellScript \ + --scripts "sudo systemctl status k3s; sudo k3s kubectl get nodes" + +# View k3s logs via Run Command +az vm run-command invoke \ + --resource-group $RESOURCE_GROUP \ + --name k3s-uksouth \ + --command-id RunShellScript \ + --scripts "sudo journalctl -u k3s --no-pager -n 50" +``` + +### Istio Issues + +```bash +# Check Istio pods +k-uksouth get pods -n istio-system + +# Check east-west gateway external IP +k-uksouth get svc -n istio-system istio-eastwestgateway + +# Verify remote secrets exist +k-hub get secrets -n istio-system -l istio/multiCluster=true + +# Check Istio proxy status in DocumentDB namespace +k-uksouth get pods -n documentdb-preview-ns -o jsonpath='{.items[*].spec.containers[*].name}' | tr ' ' '\n' | grep istio +``` + +### Fleet Member Not Joining + +```bash +# Check member agent logs on k3s +k-uksouth logs -n fleet-system deployment/member-agent + +# Verify hub API server is reachable (via Istio) +k-uksouth run test --rm -it --image=curlimages/curl -- curl -k https://hub-westus3-api:443/healthz +``` + +### DocumentDB Not Propagating + +```bash +# Check ClusterResourcePlacement +k-hub describe clusterresourceplacement documentdb-namespace-crp + +# Verify namespace exists on member +k-uksouth get namespace documentdb-preview-ns +``` + +### Cross-Cluster Connectivity (Istio) + +```bash +# Test Istio mesh connectivity +kubectl --context k3s-uksouth run test --rm -it --image=nicolaka/netshoot -- \ + curl -k https://documentdb-preview.documentdb-preview-ns.svc:10260/health + +# Check Istio eastwest gateway is exposed +k-uksouth get svc -n istio-system istio-eastwestgateway -o wide +``` + +## Cleanup + +```bash +# Delete everything +./delete-resources.sh + +# Force delete without confirmation +./delete-resources.sh --force + +# Delete specific resources only +./delete-resources.sh --vms-only # Only k3s VMs +./delete-resources.sh --aks-only # Only AKS clusters +``` + +## Cost Estimates + +| Resource | Configuration | Estimated Monthly Cost | +|----------|---------------|----------------------| +| AKS Hub (westus3) | 2x Standard_DS2_v2 | ~$140 | +| k3s VM (eastus2) | 1x Standard_D2s_v3 | ~$70 | +| k3s VM (uksouth) | 1x Standard_D2s_v3 | ~$70 | +| Storage (3x 10GB) | Premium SSD | ~$6 | +| Load Balancers | 3x Standard (Istio) | ~$54 | +| **Total** | | **~$340/month** | + +> **Tip**: Use `./delete-resources.sh` when not in use to avoid charges. + +## Files Reference + +| File | Description | +|------|-------------| +| `main.bicep` | Bicep template for Azure infrastructure | +| `parameters.bicepparam` | Bicep parameters file | +| `deploy-infrastructure.sh` | Deploy VMs, VNets, AKS cluster | +| `install-istio.sh` | Install Istio service mesh | +| `setup-fleet.sh` | Configure KubeFleet hub and members | +| `install-cert-manager.sh` | Install cert-manager | +| `install-documentdb-operator.sh` | Deploy DocumentDB operator | +| `deploy-documentdb.sh` | Deploy multi-region DocumentDB | +| `delete-resources.sh` | Cleanup all resources | +| `test-connection.sh` | Test DocumentDB connectivity | +| `documentdb-operator-crp.yaml` | Operator CRP (reference only — not applied) | +| `cert-manager-crp.yaml` | cert-manager CRP (for future cluster propagation) | +| `documentdb-resource-crp.yaml` | DocumentDB ClusterResourcePlacement | + +## Known Issues & Lessons Learned + +### Azure VM Run Command +This playground uses Azure VM Run Command instead of SSH for all VM operations: +- **Benefits**: Works through corporate firewalls, no SSH keys to manage, no port 22 required +- **Limitations**: ~30-60 seconds per invocation, output format requires parsing +- **Output parsing**: Results come as `[stdout]\n...\n[stderr]\n...` — extract with: + ```bash + az vm run-command invoke ... --query 'value[0].message' -o tsv | \ + awk '/^\[stdout\]/{flag=1; next} /^\[stderr\]/{flag=0} flag' + ``` + +### k3s TLS SANs and API Server (Critical) +- k3s generates certificates with `127.0.0.1` only — external access requires adding the public IP as a TLS SAN +- The cloud-init uses Azure Instance Metadata Service (IMDS) to get the public IP before k3s install: + ```bash + curl -s -H Metadata:true "http://169.254.169.254/metadata/instance/network/interface/0/ipv4/ipAddress/0/publicIpAddress?api-version=2021-02-01&format=text" + ``` +- **`advertise-address`**: Must be set to the private IP, otherwise `kubernetes` endpoint uses the public IP which breaks internal pod→API server connectivity via ClusterIP (10.43.0.1) +- **`node-external-ip`**: Set to public IP so LoadBalancer services get the public IP + +### k3s kubeconfig Management +- k3s generates kubeconfig with `127.0.0.1` — scripts automatically update to public IP +- When redeploying, old kubeconfigs have stale IPs/certs — scripts delete old contexts first +- Use `kubectl config delete-context ` to clean up manually if needed + +### Istio on k3s +- **Use Helm**, not `istioctl install`, for k3s clusters — `istioctl` creates resources without Helm annotations, causing ownership conflicts if you later use Helm +- **`--skip-schema-validation`** is required for all Helm installs (`istio-base`, `istiod`, `istio/gateway`) — the gateway chart's JSON schema rejects documented values like `labels`, `env`, `service`, `networkGateway` +- **Istio CA certs are pre-generated** during infrastructure deploy (pure `openssl` operations, no cluster needed) and injected via cloud-init `write_files` with `encoding: b64` — the k3s cloud-init `runcmd` creates the `cacerts` Kubernetes secret from these files +- **Remote secrets are pre-generated** on each k3s VM via cloud-init — creates an `istio-remote-reader` service account (NOT `istio-reader-service-account` which conflicts with Helm's `istio-base` chart), extracts token and CA, and builds the complete remote-secret YAML at `/etc/istio-remote/remote-secret.yaml` +- k3s uses `servicelb` (klipper) for LoadBalancer services which assigns node IPs, not public IPs +- Patch east-west gateway services with `externalIPs` pointing to the node's public IP: + ```bash + kubectl patch svc istio-eastwestgateway -n istio-system \ + --type='json' -p='[{"op": "add", "path": "/spec/externalIPs", "value": [""]}]' + ``` +- Set `pilot.autoscaleEnabled=false` and `pilot.replicaCount=1` for single-node k3s clusters + +### DocumentDB on k3s +- The `environment` field only supports `aks`, `eks`, `gke` — **use `aks` for k3s clusters** +- DocumentDB operator is installed on k3s via Run Command (base64-encoded manifests + CNPG upstream release) +- CNPG must be installed separately on k3s since the Helm chart can't be transferred easily + +### cert-manager on k3s +- Set `startupapicheck.enabled=false` to avoid timeouts on resource-constrained k3s +- Apply CRDs explicitly with `kubectl apply -f` before Helm install (the `crds.enabled=true` flag can silently fail) + +### Corporate Network (NRMS) +- Port 22 could be denied by the corporate firewall; to enable SSH, add allow rule +- Port 6443 might be blocked by corporate VPN/firewall +- **NSGs are deployed in Bicep** and associated at the subnet level to prevent Azure NRMS from auto-creating restrictive NSGs — without pre-created NSGs, NRMS may block ports needed for Istio and k3s +- Both AKS and k3s NSGs include all required ports (SSH, K8s API, all Istio control/data plane ports, HTTP/HTTPS) + +### Bicep Deployment Tips +- Use `resourceId()` function for subnet references to avoid race conditions +- Add explicit `dependsOn` for AKS clusters referencing VNets +- Check AKS supported Kubernetes versions: `az aks get-versions --location ` +- Azure VMs require SSH key even when not using SSH; changing key on existing VM causes "PropertyChangeNotAllowed" error +- **`format()` escaping for cloud-init**: In Bicep `format()` templates, `{{` produces literal `{` and `}}` produces literal `}` — critical when embedding bash `${VAR}` or jsonpath `{.data.token}` in cloud-init scripts +- **`@secure()` does not work with Bicep `array` type** (BCP124) — Istio cert data is passed as a plain array parameter + +## Related Playgrounds + +- [aks-fleet-deployment](../aks-fleet-deployment/) - Pure AKS multi-region with KubeFleet +- [aks-setup](../aks-setup/) - Single AKS cluster setup +- [multi-cloud-deployment](../multi-cloud-deployment/) - Cross-cloud (AKS + GKE + EKS) with Istio + +## Additional Resources + +- [k3s Documentation](https://docs.k3s.io/) +- [KubeFleet Documentation](https://kubefleet.dev/docs/) +- [Istio Multi-Cluster](https://istio.io/latest/docs/setup/install/multicluster/) +- [Azure VMs Documentation](https://docs.microsoft.com/en-us/azure/virtual-machines/) +- [DocumentDB Kubernetes Operator](../../README.md) diff --git a/documentdb-playground/k3s-azure-fleet/cert-manager-crp.yaml b/documentdb-playground/k3s-azure-fleet/cert-manager-crp.yaml new file mode 100644 index 00000000..75a90575 --- /dev/null +++ b/documentdb-playground/k3s-azure-fleet/cert-manager-crp.yaml @@ -0,0 +1,44 @@ +apiVersion: placement.kubernetes-fleet.io/v1beta1 +kind: ClusterResourcePlacement +metadata: + name: cert-manager-crp +spec: + resourceSelectors: + - group: "" + version: v1 + kind: Namespace + name: cert-manager + - group: "apiextensions.k8s.io" + version: v1 + kind: CustomResourceDefinition + labelSelector: + matchLabels: + app.kubernetes.io/instance: cert-manager + - group: "rbac.authorization.k8s.io" + version: v1 + kind: ClusterRole + labelSelector: + matchLabels: + app.kubernetes.io/instance: cert-manager + - group: "rbac.authorization.k8s.io" + version: v1 + kind: ClusterRoleBinding + labelSelector: + matchLabels: + app.kubernetes.io/instance: cert-manager + - group: "admissionregistration.k8s.io" + version: v1 + kind: MutatingWebhookConfiguration + labelSelector: + matchLabels: + app.kubernetes.io/instance: cert-manager + - group: "admissionregistration.k8s.io" + version: v1 + kind: ValidatingWebhookConfiguration + labelSelector: + matchLabels: + app.kubernetes.io/instance: cert-manager + policy: + placementType: PickAll + strategy: + type: RollingUpdate diff --git a/documentdb-playground/k3s-azure-fleet/delete-resources.sh b/documentdb-playground/k3s-azure-fleet/delete-resources.sh new file mode 100755 index 00000000..4b5036ff --- /dev/null +++ b/documentdb-playground/k3s-azure-fleet/delete-resources.sh @@ -0,0 +1,166 @@ +#!/usr/bin/env bash +set -euo pipefail + +# Delete all resources created by this playground + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" + +# Load deployment info if available +if [ -f "$SCRIPT_DIR/.deployment-info" ]; then + source "$SCRIPT_DIR/.deployment-info" +fi + +RESOURCE_GROUP="${RESOURCE_GROUP:-documentdb-k3s-fleet-rg}" +FORCE="${FORCE:-false}" + +# Parse arguments +while [[ $# -gt 0 ]]; do + case $1 in + --force|-f) + FORCE="true" + shift + ;; + --resource-group|-g) + RESOURCE_GROUP="$2" + shift 2 + ;; + --vms-only) + VMS_ONLY="true" + shift + ;; + --aks-only) + AKS_ONLY="true" + shift + ;; + -h|--help) + echo "Usage: $0 [OPTIONS]" + echo "" + echo "Options:" + echo " --force, -f Skip confirmation prompts" + echo " --resource-group, -g Resource group name (default: documentdb-k3s-fleet-rg)" + echo " --vms-only Delete only k3s VMs" + echo " --aks-only Delete only AKS clusters" + echo " -h, --help Show this help" + exit 0 + ;; + *) + echo "Unknown option: $1" + exit 1 + ;; + esac +done + +echo "=======================================" +echo "Resource Cleanup" +echo "=======================================" +echo "Resource Group: $RESOURCE_GROUP" +echo "=======================================" + +# Check if resource group exists +if ! az group show --name "$RESOURCE_GROUP" &>/dev/null; then + echo "Resource group '$RESOURCE_GROUP' does not exist. Nothing to delete." + exit 0 +fi + +# Confirmation +if [ "$FORCE" != "true" ]; then + echo "" + echo "⚠️ WARNING: This will delete all resources in '$RESOURCE_GROUP'" + echo "" + read -p "Are you sure? (yes/no): " CONFIRM + if [ "$CONFIRM" != "yes" ]; then + echo "Cancelled." + exit 0 + fi +fi + +# Delete specific resources if requested +if [ "${VMS_ONLY:-false}" = "true" ]; then + echo "" + echo "Deleting k3s VMs only..." + + VMS=$(az vm list -g "$RESOURCE_GROUP" --query "[?contains(name,'k3s')].name" -o tsv) + for vm in $VMS; do + echo " Deleting VM: $vm" + az vm delete -g "$RESOURCE_GROUP" -n "$vm" --yes --no-wait + done + + echo "✓ VM deletion initiated" + exit 0 +fi + +if [ "${AKS_ONLY:-false}" = "true" ]; then + echo "" + echo "Deleting AKS clusters only..." + + CLUSTERS=$(az aks list -g "$RESOURCE_GROUP" --query "[].name" -o tsv) + for cluster in $CLUSTERS; do + echo " Deleting AKS cluster: $cluster" + az aks delete -g "$RESOURCE_GROUP" -n "$cluster" --yes --no-wait + done + + echo "✓ AKS deletion initiated" + exit 0 +fi + +# Delete DocumentDB resources first (if clusters still exist) +echo "" +echo "Cleaning up Kubernetes resources..." + +# Try to delete DocumentDB resources from hub +if [ -n "${HUB_CLUSTER_NAME:-}" ]; then + if kubectl config get-contexts "$HUB_CLUSTER_NAME" &>/dev/null 2>&1; then + echo " Deleting DocumentDB ClusterResourcePlacement..." + kubectl --context "$HUB_CLUSTER_NAME" delete clusterresourceplacement documentdb-namespace-crp --ignore-not-found=true 2>/dev/null || true + kubectl --context "$HUB_CLUSTER_NAME" delete clusterresourceplacement documentdb-operator-crp --ignore-not-found=true 2>/dev/null || true + kubectl --context "$HUB_CLUSTER_NAME" delete clusterresourceplacement cert-manager-crp --ignore-not-found=true 2>/dev/null || true + + echo " Deleting DocumentDB namespace..." + kubectl --context "$HUB_CLUSTER_NAME" delete namespace documentdb-preview-ns --ignore-not-found=true 2>/dev/null || true + fi +fi + +# Delete entire resource group +echo "" +echo "Deleting resource group '$RESOURCE_GROUP'..." +echo "This will delete all VMs, AKS clusters, VNets, and associated resources." +az group delete --name "$RESOURCE_GROUP" --yes --no-wait + +echo "" +echo "✓ Resource group deletion initiated" + +# Clean up local files +echo "" +echo "Cleaning up local files..." +rm -f "$SCRIPT_DIR/.deployment-info" +rm -f "$SCRIPT_DIR/documentdb-operator-*.tgz" +rm -rf "$SCRIPT_DIR/.istio-certs" + +# Clean up kubeconfig contexts +echo "Cleaning up kubectl contexts..." +for ctx in $(kubectl config get-contexts -o name 2>/dev/null | grep -E "(hub-|member-|k3s-)" || true); do + kubectl config delete-context "$ctx" 2>/dev/null || true +done + +# Remove aliases from shell config files +for SHELL_RC in "$HOME/.bashrc" "$HOME/.zshrc"; do + if [ -f "$SHELL_RC" ]; then + if grep -q "# BEGIN k3s-fleet aliases" "$SHELL_RC" 2>/dev/null; then + echo "Removing kubectl aliases from $SHELL_RC..." + awk '/# BEGIN k3s-fleet aliases/,/# END k3s-fleet aliases/ {next} {print}' "$SHELL_RC" > "$SHELL_RC.tmp" + mv "$SHELL_RC.tmp" "$SHELL_RC" + fi + fi +done + +echo "" +echo "=======================================" +echo "✅ Cleanup Complete!" +echo "=======================================" +echo "" +echo "Resource group deletion is running in the background." +echo "Run 'az group show -n $RESOURCE_GROUP' to check status." +echo "" +echo "To verify deletion is complete:" +echo " az group list --query \"[?name=='$RESOURCE_GROUP']\" -o table" +echo "=======================================" diff --git a/documentdb-playground/k3s-azure-fleet/deploy-documentdb.sh b/documentdb-playground/k3s-azure-fleet/deploy-documentdb.sh new file mode 100755 index 00000000..2486603f --- /dev/null +++ b/documentdb-playground/k3s-azure-fleet/deploy-documentdb.sh @@ -0,0 +1,266 @@ +#!/usr/bin/env bash +set -euo pipefail + +# Deploy multi-region DocumentDB with cross-cluster replication using Istio + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" + +# Load deployment info +if [ -f "$SCRIPT_DIR/.deployment-info" ]; then + source "$SCRIPT_DIR/.deployment-info" +else + echo "Error: Deployment info not found. Run deploy-infrastructure.sh first." + exit 1 +fi + +# Password from argument or environment +DOCUMENTDB_PASSWORD="${1:-${DOCUMENTDB_PASSWORD:-}}" + +# Generate password if not provided +if [ -z "$DOCUMENTDB_PASSWORD" ]; then + echo "No password provided. Generating a secure password..." + DOCUMENTDB_PASSWORD=$(openssl rand -base64 32 | tr -d "=+/" | cut -c1-25) + echo "Generated password: $DOCUMENTDB_PASSWORD" + echo "(Save this password - you'll need it to connect to the database)" + echo "" +fi + +HUB_CLUSTER_NAME="hub-${HUB_REGION}" + +echo "=======================================" +echo "DocumentDB Multi-Region Deployment" +echo "=======================================" +echo "Hub Cluster: $HUB_CLUSTER_NAME" +echo "Cross-cluster networking: Istio" +echo "=======================================" + +# Build list of all clusters +ALL_CLUSTERS="$HUB_CLUSTER_NAME" + +# Add k3s clusters +IFS=' ' read -ra K3S_REGION_ARRAY <<< "$K3S_REGIONS" +for region in "${K3S_REGION_ARRAY[@]}"; do + if kubectl config get-contexts "k3s-$region" &>/dev/null; then + ALL_CLUSTERS="$ALL_CLUSTERS k3s-$region" + fi +done + +CLUSTER_ARRAY=($ALL_CLUSTERS) +echo "Discovered ${#CLUSTER_ARRAY[@]} clusters:" +for cluster in "${CLUSTER_ARRAY[@]}"; do + echo " - $cluster" +done + +# Select primary cluster (prefer hub cluster) +PRIMARY_CLUSTER="$HUB_CLUSTER_NAME" +echo "" +echo "Selected primary cluster: $PRIMARY_CLUSTER" + +# Build cluster list YAML +CLUSTER_LIST="" +for cluster in "${CLUSTER_ARRAY[@]}"; do + # Note: DocumentDB only supports 'aks', 'eks', 'gke' environments. + # k3s clusters use 'aks' environment since they behave similarly. + ENV="aks" + + if [ -z "$CLUSTER_LIST" ]; then + CLUSTER_LIST=" - name: ${cluster}" + CLUSTER_LIST="${CLUSTER_LIST}"$'\n'" environment: ${ENV}" + else + CLUSTER_LIST="${CLUSTER_LIST}"$'\n'" - name: ${cluster}" + CLUSTER_LIST="${CLUSTER_LIST}"$'\n'" environment: ${ENV}" + fi +done + +# Create cluster identification ConfigMaps +echo "" +echo "=======================================" +echo "Creating cluster identification ConfigMaps..." +echo "=======================================" + +for cluster in "${CLUSTER_ARRAY[@]}"; do + echo "Processing $cluster..." + + if ! kubectl config get-contexts "$cluster" &>/dev/null; then + echo " ✗ Context not found, skipping" + continue + fi + + kubectl --context "$cluster" create configmap cluster-name \ + -n kube-system \ + --from-literal=name="$cluster" \ + --dry-run=client -o yaml | kubectl --context "$cluster" apply -f - + + echo " ✓ ConfigMap created" +done + +# Deploy DocumentDB resources +echo "" +echo "=======================================" +echo "Deploying DocumentDB resources..." +echo "=======================================" + +kubectl config use-context "$HUB_CLUSTER_NAME" + +# Check for existing resources +EXISTING="" +if kubectl get namespace documentdb-preview-ns &>/dev/null 2>&1; then + EXISTING="${EXISTING}namespace " +fi +if kubectl get secret documentdb-credentials -n documentdb-preview-ns &>/dev/null 2>&1; then + EXISTING="${EXISTING}secret " +fi +if kubectl get documentdb documentdb-preview -n documentdb-preview-ns &>/dev/null 2>&1; then + EXISTING="${EXISTING}documentdb " +fi + +if [ -n "$EXISTING" ]; then + echo "" + echo "⚠️ Warning: Existing resources found: $EXISTING" + echo "" + echo "Options:" + echo "1. Delete existing resources and redeploy" + echo "2. Update existing deployment" + echo "3. Cancel" + read -p "Choose (1/2/3): " CHOICE + + case $CHOICE in + 1) + echo "Deleting existing resources..." + kubectl delete clusterresourceplacement documentdb-namespace-crp --ignore-not-found=true + kubectl delete namespace documentdb-preview-ns --ignore-not-found=true + sleep 10 + ;; + 2) + echo "Updating existing deployment..." + ;; + 3|*) + echo "Cancelled." + exit 0 + ;; + esac +fi + +# Generate manifest with substitutions +TEMP_YAML=$(mktemp) + +# Escape password for safe use in sed (handle /, &, \ characters) +ESCAPED_PASSWORD="${DOCUMENTDB_PASSWORD//\\/\\\\}" +ESCAPED_PASSWORD="${ESCAPED_PASSWORD//&/\\&}" +ESCAPED_PASSWORD="${ESCAPED_PASSWORD//\//\\/}" + +sed -e "s/{{DOCUMENTDB_PASSWORD}}/$ESCAPED_PASSWORD/g" \ + -e "s/{{PRIMARY_CLUSTER}}/$PRIMARY_CLUSTER/g" \ + "$SCRIPT_DIR/documentdb-resource-crp.yaml" | \ +while IFS= read -r line; do + if [[ "$line" == '{{CLUSTER_LIST}}' ]]; then + echo "$CLUSTER_LIST" + else + echo "$line" + fi +done > "$TEMP_YAML" + +echo "" +echo "Generated configuration:" +echo "------------------------" +echo "Primary: $PRIMARY_CLUSTER" +echo "Clusters:" +echo "$CLUSTER_LIST" +echo "------------------------" + +# Apply configuration +echo "" +echo "Applying DocumentDB configuration..." +kubectl apply -f "$TEMP_YAML" +rm -f "$TEMP_YAML" + +# Check ClusterResourcePlacement +echo "" +echo "Checking ClusterResourcePlacement status..." +kubectl get clusterresourceplacement documentdb-namespace-crp -o wide + +# Wait for propagation +echo "" +echo "Waiting for resources to propagate..." +sleep 15 + +# Verify deployment +echo "" +echo "=======================================" +echo "Deployment Verification" +echo "=======================================" + +for cluster in "${CLUSTER_ARRAY[@]}"; do + echo "" + echo "=== $cluster ===" + + if ! kubectl config get-contexts "$cluster" &>/dev/null; then + echo " ✗ Context not found" + continue + fi + + # Check namespace + if kubectl --context "$cluster" get namespace documentdb-preview-ns &>/dev/null; then + echo " ✓ Namespace exists" + + # Check DocumentDB + if kubectl --context "$cluster" get documentdb documentdb-preview -n documentdb-preview-ns &>/dev/null; then + STATUS=$(kubectl --context "$cluster" get documentdb documentdb-preview -n documentdb-preview-ns -o jsonpath='{.status.phase}' 2>/dev/null || echo "Unknown") + ROLE="REPLICA" + [ "$cluster" = "$PRIMARY_CLUSTER" ] && ROLE="PRIMARY" + echo " ✓ DocumentDB: $STATUS (Role: $ROLE)" + else + echo " ✗ DocumentDB not found" + fi + + # Check pods + PODS=$(kubectl --context "$cluster" get pods -n documentdb-preview-ns --no-headers 2>/dev/null | wc -l || echo "0") + echo " Pods: $PODS" + + if [ "$PODS" -gt 0 ]; then + kubectl --context "$cluster" get pods -n documentdb-preview-ns 2>/dev/null | head -5 + fi + else + echo " ✗ Namespace not found (propagating...)" + fi +done + +# Connection information +echo "" +echo "=======================================" +echo "Connection Information" +echo "=======================================" +echo "" +echo "Username: default_user" +echo "Password: $DOCUMENTDB_PASSWORD" +echo "" +echo "To connect via port-forward:" +echo " kubectl --context $PRIMARY_CLUSTER port-forward -n documentdb-preview-ns svc/documentdb-preview 10260:10260" +echo "" +echo "Connection string:" +echo " mongodb://default_user:$DOCUMENTDB_PASSWORD@localhost:10260/?directConnection=true&authMechanism=SCRAM-SHA-256&tls=true&tlsAllowInvalidCertificates=true" +echo "" + +# Failover commands +echo "Failover commands:" +for cluster in "${CLUSTER_ARRAY[@]}"; do + if [ "$cluster" != "$PRIMARY_CLUSTER" ]; then + echo "" + echo "# Failover to $cluster:" + echo "kubectl --context $HUB_CLUSTER_NAME patch documentdb documentdb-preview -n documentdb-preview-ns \\" + echo " --type='merge' -p '{\"spec\":{\"clusterReplication\":{\"primary\":\"$cluster\"}}}'" + fi +done + +echo "" +echo "=======================================" +echo "✅ DocumentDB Deployment Complete!" +echo "=======================================" +echo "" +echo "Monitor deployment:" +echo " watch 'kubectl --context $HUB_CLUSTER_NAME get clusterresourceplacement documentdb-namespace-crp -o wide'" +echo "" +echo "Check all clusters:" +CLUSTER_STRING=$(IFS=' '; echo "${CLUSTER_ARRAY[*]}") +echo " for c in $CLUSTER_STRING; do echo \"=== \$c ===\"; kubectl --context \$c get documentdb,pods -n documentdb-preview-ns; done" +echo "=======================================" diff --git a/documentdb-playground/k3s-azure-fleet/deploy-infrastructure.sh b/documentdb-playground/k3s-azure-fleet/deploy-infrastructure.sh new file mode 100755 index 00000000..daea7c53 --- /dev/null +++ b/documentdb-playground/k3s-azure-fleet/deploy-infrastructure.sh @@ -0,0 +1,363 @@ +#!/bin/bash +set -e + +# ================================ +# k3s + AKS Infrastructure Deployment with Istio +# ================================ +# Deploys: +# - 1 AKS cluster (hub) in westus3 +# - 2 k3s VMs in eastus2 and uksouth +# - No VNet peering (Istio handles cross-cluster traffic) +# - Uses Azure VM Run Command (no SSH required) + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" + +# Configuration +RESOURCE_GROUP="${RESOURCE_GROUP:-documentdb-k3s-fleet-rg}" +HUB_REGION="${HUB_REGION:-westus3}" +K3S_REGIONS="${K3S_REGIONS_CSV:-eastus2,uksouth}" + +# Convert comma-separated to array +IFS=',' read -ra K3S_REGION_ARRAY <<< "$K3S_REGIONS" + +echo "=======================================" +echo "k3s + AKS Infrastructure Deployment" +echo "=======================================" +echo "Resource Group: $RESOURCE_GROUP" +echo "Hub Region: $HUB_REGION" +echo "k3s Regions: ${K3S_REGION_ARRAY[*]}" +echo "" +echo "Networking: Istio service mesh (no VNet peering)" +echo "VM Access: Azure VM Run Command (no SSH required)" +echo "=======================================" +echo "" + +# Use a stable SSH key path (Azure requires SSH key for VMs, but we use Run Command instead) +SSH_KEY_PATH="${SCRIPT_DIR}/.ssh-key" +if [ ! -f "$SSH_KEY_PATH" ]; then + echo "Generating SSH key (required by Azure, but we use Run Command instead)..." + ssh-keygen -t rsa -b 2048 -f "$SSH_KEY_PATH" -N "" -C "k3s-azure-fleet" -q +fi +SSH_PUBLIC_KEY=$(cat "${SSH_KEY_PATH}.pub") + +# ─── Generate Istio CA certificates (pre-deploy) ─── +ISTIO_VERSION="${ISTIO_VERSION:-1.24.0}" +CERT_DIR="${SCRIPT_DIR}/.istio-certs" +mkdir -p "$CERT_DIR" + +echo "Generating Istio CA certificates..." + +# Download Istio cert tools if not present +if [ ! -d "$CERT_DIR/istio-${ISTIO_VERSION}" ]; then + echo " Downloading Istio ${ISTIO_VERSION} cert tools..." + curl -sL "https://github.com/istio/istio/archive/refs/tags/${ISTIO_VERSION}.tar.gz" | tar xz -C "$CERT_DIR" +fi + +# Generate root CA (shared across all clusters) +if [ ! -f "$CERT_DIR/root-cert.pem" ]; then + echo " Generating shared root CA..." + pushd "$CERT_DIR" > /dev/null + make -f "istio-${ISTIO_VERSION}/tools/certs/Makefile.selfsigned.mk" root-ca + popd > /dev/null +fi + +# Generate per-cluster intermediate certs and build JSON array for Bicep +ISTIO_CERTS_JSON="[" +for i in "${!K3S_REGION_ARRAY[@]}"; do + region="${K3S_REGION_ARRAY[$i]}" + cluster_name="k3s-${region}" + + if [ ! -f "$CERT_DIR/${cluster_name}/ca-cert.pem" ]; then + echo " Generating certificates for ${cluster_name}..." + pushd "$CERT_DIR" > /dev/null + make -f "istio-${ISTIO_VERSION}/tools/certs/Makefile.selfsigned.mk" "${cluster_name}-cacerts" + popd > /dev/null + fi + + # Base64-encode each PEM file for cloud-init write_files (encoding: b64) + ROOT_CERT_B64=$(base64 < "$CERT_DIR/${cluster_name}/root-cert.pem" | tr -d '\n') + CA_CERT_B64=$(base64 < "$CERT_DIR/${cluster_name}/ca-cert.pem" | tr -d '\n') + CA_KEY_B64=$(base64 < "$CERT_DIR/${cluster_name}/ca-key.pem" | tr -d '\n') + CERT_CHAIN_B64=$(base64 < "$CERT_DIR/${cluster_name}/cert-chain.pem" | tr -d '\n') + + [ "$i" -gt 0 ] && ISTIO_CERTS_JSON+="," + ISTIO_CERTS_JSON+="{\"rootCert\":\"${ROOT_CERT_B64}\",\"caCert\":\"${CA_CERT_B64}\",\"caKey\":\"${CA_KEY_B64}\",\"certChain\":\"${CERT_CHAIN_B64}\"}" +done +ISTIO_CERTS_JSON+="]" + +echo "✓ Istio certificates generated for ${#K3S_REGION_ARRAY[@]} k3s cluster(s)" + +# Also generate certs for AKS hub (applied later by install-istio.sh via kubectl) +if [ ! -f "$CERT_DIR/hub-${HUB_REGION}/ca-cert.pem" ]; then + echo " Generating certificates for hub-${HUB_REGION}..." + pushd "$CERT_DIR" > /dev/null + make -f "istio-${ISTIO_VERSION}/tools/certs/Makefile.selfsigned.mk" "hub-${HUB_REGION}-cacerts" + popd > /dev/null +fi + +# Create resource group +echo "Creating/verifying resource group..." +if az group show --name "$RESOURCE_GROUP" &>/dev/null; then + RG_STATE=$(az group show --name "$RESOURCE_GROUP" --query "properties.provisioningState" -o tsv 2>/dev/null || echo "Unknown") + if [ "$RG_STATE" = "Deleting" ]; then + echo "Resource group is being deleted. Waiting..." + while az group show --name "$RESOURCE_GROUP" &>/dev/null; do + sleep 10 + done + echo "Creating resource group '$RESOURCE_GROUP' in '$HUB_REGION'" + az group create --name "$RESOURCE_GROUP" --location "$HUB_REGION" --output none + else + echo "Using existing resource group '$RESOURCE_GROUP'" + fi +else + echo "Creating resource group '$RESOURCE_GROUP' in '$HUB_REGION'" + az group create --name "$RESOURCE_GROUP" --location "$HUB_REGION" --output none +fi + +# Check if VMs already exist (to skip Bicep if just re-running for kubeconfig) +EXISTING_VMS=$(az vm list -g "$RESOURCE_GROUP" --query "[?contains(name,'k3s')].name" -o tsv 2>/dev/null | wc -l | tr -d ' ') +SKIP_BICEP=false + +if [ "$EXISTING_VMS" -gt 0 ]; then + echo "" + echo "Found $EXISTING_VMS existing k3s VM(s). Skipping Bicep deployment." + echo "(Delete VMs or resource group to force re-deployment)" + SKIP_BICEP=true +fi + +if [ "$SKIP_BICEP" = "false" ]; then + # Deploy Bicep template + echo "" + echo "Deploying Azure infrastructure with Bicep..." + echo "(This includes AKS hub and k3s VMs - typically takes 5-10 minutes)" + + # Build k3s regions array for Bicep + K3S_REGIONS_JSON=$(printf '%s\n' "${K3S_REGION_ARRAY[@]}" | jq -R . | jq -s .) + + az deployment group create \ + --resource-group "$RESOURCE_GROUP" \ + --template-file "${SCRIPT_DIR}/main.bicep" \ + --parameters hubLocation="$HUB_REGION" \ + --parameters k3sRegions="$K3S_REGIONS_JSON" \ + --parameters sshPublicKey="$SSH_PUBLIC_KEY" \ + --parameters istioCerts="$ISTIO_CERTS_JSON" \ + --output none + + echo "✓ Infrastructure deployed" +fi + +# Get deployment outputs +echo "" +echo "Retrieving deployment outputs..." + +DEPLOYMENT_OUTPUT=$(az deployment group show \ + --resource-group "$RESOURCE_GROUP" \ + --name main \ + --query "properties.outputs" \ + -o json 2>/dev/null || echo "{}") + +AKS_CLUSTER_NAME=$(echo "$DEPLOYMENT_OUTPUT" | jq -r '.aksClusterName.value // empty') +K3S_VM_NAMES=$(echo "$DEPLOYMENT_OUTPUT" | jq -r '.k3sVmNames.value // [] | @csv' | tr -d '"') +K3S_PUBLIC_IPS=$(echo "$DEPLOYMENT_OUTPUT" | jq -r '.k3sVmPublicIps.value // [] | @csv' | tr -d '"') + +# Fallback if outputs not available yet +if [ -z "$AKS_CLUSTER_NAME" ]; then + AKS_CLUSTER_NAME="hub-${HUB_REGION}" +fi + +echo "AKS Cluster: $AKS_CLUSTER_NAME" +echo "k3s VMs: $K3S_VM_NAMES" +echo "k3s IPs: $K3S_PUBLIC_IPS" + +# Configure kubectl for AKS +echo "" +echo "Configuring kubectl for AKS hub cluster..." +az aks get-credentials \ + --resource-group "$RESOURCE_GROUP" \ + --name "$AKS_CLUSTER_NAME" \ + --overwrite-existing \ + --admin \ + --context "hub-${HUB_REGION}" \ + 2>/dev/null || \ +az aks get-credentials \ + --resource-group "$RESOURCE_GROUP" \ + --name "$AKS_CLUSTER_NAME" \ + --overwrite-existing \ + --context "hub-${HUB_REGION}" + +echo "✓ AKS kubectl context: hub-${HUB_REGION}" + +# Wait for k3s VMs to be ready and get kubeconfig via Run Command +echo "" +echo "Waiting for k3s clusters to be ready (using Azure VM Run Command)..." +echo "This avoids SSH and works through corporate firewalls." + +IFS=',' read -ra K3S_IP_ARRAY <<< "$K3S_PUBLIC_IPS" + +for i in "${!K3S_REGION_ARRAY[@]}"; do + region="${K3S_REGION_ARRAY[$i]}" + vm_name="k3s-${region}" + + echo "" + echo "Configuring k3s-${region}..." + + # Get public IP for kubeconfig + public_ip=$(az vm show -g "$RESOURCE_GROUP" -n "$vm_name" -d --query publicIps -o tsv 2>/dev/null || echo "") + K3S_IP_ARRAY[$i]="$public_ip" + + if [ -z "$public_ip" ]; then + echo "⚠ Could not get IP for $vm_name, skipping..." + continue + fi + + echo " VM Public IP: $public_ip" + + # Wait for k3s to be ready using Run Command + echo " Waiting for k3s to be ready..." + k3s_ready=false + for attempt in {1..30}; do + result=$(az vm run-command invoke \ + --resource-group "$RESOURCE_GROUP" \ + --name "$vm_name" \ + --command-id RunShellScript \ + --scripts "sudo k3s kubectl get nodes 2>/dev/null && echo K3S_READY" \ + --query 'value[0].message' -o tsv 2>/dev/null || echo "") + + if echo "$result" | grep -q "K3S_READY"; then + echo " ✓ k3s ready" + k3s_ready=true + break + fi + echo " Waiting for k3s... (attempt $attempt/30)" + sleep 10 + done + + if [ "$k3s_ready" = "false" ]; then + echo " ✗ ERROR: k3s failed to become ready on $vm_name after 5 minutes" + echo " Check VM status: az vm run-command invoke -g $RESOURCE_GROUP -n $vm_name --command-id RunShellScript --scripts 'systemctl status k3s'" + continue + fi + + # Get kubeconfig via Run Command + echo " Retrieving kubeconfig via Run Command..." + RAW_OUTPUT=$(az vm run-command invoke \ + --resource-group "$RESOURCE_GROUP" \ + --name "$vm_name" \ + --command-id RunShellScript \ + --scripts "sudo cat /etc/rancher/k3s/k3s.yaml" \ + --query 'value[0].message' -o tsv 2>/dev/null || echo "") + + # Extract the YAML from the Run Command output + # The output format is: [stdout]\n\n[stderr]\n + # We need to extract just the content between [stdout] and [stderr] + KUBECONFIG_CONTENT=$(echo "$RAW_OUTPUT" | awk '/^\[stdout\]/{flag=1; next} /^\[stderr\]/{flag=0} flag') + + # Fallback: try to find apiVersion line and extract from there + if [ -z "$KUBECONFIG_CONTENT" ] || ! echo "$KUBECONFIG_CONTENT" | grep -q "apiVersion"; then + KUBECONFIG_CONTENT=$(echo "$RAW_OUTPUT" | sed -n '/^apiVersion:/,/^current-context:/p') + # Add the current-context line if we have it + CURRENT_CTX=$(echo "$RAW_OUTPUT" | grep "^current-context:" | head -1) + if [ -n "$CURRENT_CTX" ]; then + KUBECONFIG_CONTENT="$KUBECONFIG_CONTENT"$'\n'"$CURRENT_CTX" + fi + fi + + if [ -n "$KUBECONFIG_CONTENT" ] && echo "$KUBECONFIG_CONTENT" | grep -q "apiVersion"; then + # Replace localhost/127.0.0.1 with public IP and set context name + KUBECONFIG_FILE="$HOME/.kube/k3s-${region}.yaml" + echo "$KUBECONFIG_CONTENT" | \ + sed "s|127.0.0.1|${public_ip}|g" | \ + sed "s|server: https://[^:]*:|server: https://${public_ip}:|g" | \ + sed "s|name: default|name: k3s-${region}|g" | \ + sed "s|cluster: default|cluster: k3s-${region}|g" | \ + sed "s|user: default|user: k3s-${region}|g" | \ + sed "s|current-context: default|current-context: k3s-${region}|g" \ + > "$KUBECONFIG_FILE" + + chmod 600 "$KUBECONFIG_FILE" + + # Delete existing context if present (avoids merge conflicts) + kubectl config delete-context "k3s-${region}" 2>/dev/null || true + kubectl config delete-cluster "k3s-${region}" 2>/dev/null || true + kubectl config delete-user "k3s-${region}" 2>/dev/null || true + + # Merge into main kubeconfig + export KUBECONFIG="$HOME/.kube/config:$KUBECONFIG_FILE" + kubectl config view --flatten > "$HOME/.kube/config.new" + mv "$HOME/.kube/config.new" "$HOME/.kube/config" + chmod 600 "$HOME/.kube/config" + unset KUBECONFIG + + echo " ✓ Context added: k3s-${region}" + else + echo " ⚠ Could not retrieve kubeconfig for k3s-${region}" + echo " Debug: Run Command output was:" + echo "$KUBECONFIG_CONTENT" | head -5 + fi +done + +# Create kubectl aliases +echo "" +echo "Setting up kubectl aliases..." + +ALIAS_FILE="$HOME/.bashrc" +if [[ "$OSTYPE" == "darwin"* ]]; then + ALIAS_FILE="$HOME/.zshrc" +fi + +# Remove old aliases (use markers for clean removal) +if [ -f "$ALIAS_FILE" ]; then + awk '/# BEGIN k3s-fleet aliases/,/# END k3s-fleet aliases/ {next} {print}' "$ALIAS_FILE" > "$ALIAS_FILE.tmp" 2>/dev/null || true + mv "$ALIAS_FILE.tmp" "$ALIAS_FILE" 2>/dev/null || true +fi + +# Add new aliases with markers +{ + echo "" + echo "# BEGIN k3s-fleet aliases" + echo "alias k-hub='kubectl --context hub-${HUB_REGION}'" + echo "alias k-${HUB_REGION}='kubectl --context hub-${HUB_REGION}'" + for region in "${K3S_REGION_ARRAY[@]}"; do + echo "alias k-${region}='kubectl --context k3s-${region}'" + done + echo "# END k3s-fleet aliases" +} >> "$ALIAS_FILE" + +echo "✓ Aliases added to $ALIAS_FILE" + +# Save deployment info (quote values with spaces) +DEPLOYMENT_INFO_FILE="${SCRIPT_DIR}/.deployment-info" +{ + echo "RESOURCE_GROUP=\"$RESOURCE_GROUP\"" + echo "HUB_REGION=\"$HUB_REGION\"" + echo "HUB_CLUSTER_NAME=\"hub-${HUB_REGION}\"" + echo "AKS_CLUSTER_NAME=\"$AKS_CLUSTER_NAME\"" + echo "K3S_REGIONS=\"${K3S_REGION_ARRAY[*]}\"" + echo "K3S_PUBLIC_IPS=\"${K3S_IP_ARRAY[*]}\"" +} > "$DEPLOYMENT_INFO_FILE" + +echo "" +echo "=======================================" +echo "Infrastructure Deployment Complete!" +echo "=======================================" +echo "" +echo "Clusters:" +echo " - hub-${HUB_REGION} (AKS)" +for i in "${!K3S_REGION_ARRAY[@]}"; do + echo " - k3s-${K3S_REGION_ARRAY[$i]} (VM: ${K3S_IP_ARRAY[$i]})" +done +echo "" +echo "Next steps:" +echo " 1. Source your shell config: source $ALIAS_FILE" +echo " 2. Install Istio: ./install-istio.sh" +echo " 3. Setup Fleet: ./setup-fleet.sh" +echo " 4. Install cert-manager: ./install-cert-manager.sh" +echo " 5. Install DocumentDB operator: ./install-documentdb-operator.sh" +echo " 6. Deploy DocumentDB: ./deploy-documentdb.sh" +echo "" +echo "Quick test:" +echo " kubectl --context hub-${HUB_REGION} get nodes" +for region in "${K3S_REGION_ARRAY[@]}"; do + echo " kubectl --context k3s-${region} get nodes" +done +echo "" diff --git a/documentdb-playground/k3s-azure-fleet/documentdb-operator-crp.yaml b/documentdb-playground/k3s-azure-fleet/documentdb-operator-crp.yaml new file mode 100644 index 00000000..b0fa5b1e --- /dev/null +++ b/documentdb-playground/k3s-azure-fleet/documentdb-operator-crp.yaml @@ -0,0 +1,113 @@ +# ClusterResourcePlacement for DocumentDB operator propagation via Fleet. +# Note: The operator is installed directly on each cluster (Helm on AKS, Run Command on k3s) +# because CRP-based propagation of complex charts with CRDs causes Helm ownership conflicts. +# This CRP is kept for reference but is NOT applied by install-documentdb-operator.sh. +apiVersion: placement.kubernetes-fleet.io/v1beta1 +kind: ClusterResourcePlacement +metadata: + name: documentdb-operator-crp +spec: + resourceSelectors: + - group: "" + version: v1 + kind: Namespace + name: documentdb-operator + - group: "" + version: v1 + kind: Namespace + name: cnpg-system + - group: "apiextensions.k8s.io" + version: v1 + kind: CustomResourceDefinition + labelSelector: + matchLabels: + app: documentdb-operator + - group: "apiextensions.k8s.io" + version: v1 + kind: CustomResourceDefinition + name: publications.postgresql.cnpg.io + - group: "apiextensions.k8s.io" + version: v1 + kind: CustomResourceDefinition + name: failoverquorums.postgresql.cnpg.io + - group: "apiextensions.k8s.io" + version: v1 + kind: CustomResourceDefinition + name: poolers.postgresql.cnpg.io + - group: "apiextensions.k8s.io" + version: v1 + kind: CustomResourceDefinition + name: clusterimagecatalogs.postgresql.cnpg.io + - group: "apiextensions.k8s.io" + version: v1 + kind: CustomResourceDefinition + name: imagecatalogs.postgresql.cnpg.io + - group: "apiextensions.k8s.io" + version: v1 + kind: CustomResourceDefinition + name: backups.postgresql.cnpg.io + - group: "apiextensions.k8s.io" + version: v1 + kind: CustomResourceDefinition + name: scheduledbackups.postgresql.cnpg.io + - group: "apiextensions.k8s.io" + version: v1 + kind: CustomResourceDefinition + name: subscriptions.postgresql.cnpg.io + - group: "apiextensions.k8s.io" + version: v1 + kind: CustomResourceDefinition + name: databases.postgresql.cnpg.io + - group: "apiextensions.k8s.io" + version: v1 + kind: CustomResourceDefinition + name: clusters.postgresql.cnpg.io + # RBAC roles and bindings + - group: "rbac.authorization.k8s.io" + version: v1 + kind: ClusterRole + labelSelector: + matchLabels: + app.kubernetes.io/name: documentdb-operator + - group: "rbac.authorization.k8s.io" + version: v1 + kind: ClusterRole + name: documentdb-operator-cloudnative-pg + - group: "rbac.authorization.k8s.io" + version: v1 + kind: ClusterRole + name: documentdb-operator-cloudnative-pg-edit + - group: "rbac.authorization.k8s.io" + version: v1 + kind: ClusterRole + name: documentdb-operator-cloudnative-pg-view + - group: "rbac.authorization.k8s.io" + version: v1 + kind: ClusterRoleBinding + labelSelector: + matchLabels: + app.kubernetes.io/name: documentdb-operator + - group: "rbac.authorization.k8s.io" + version: v1 + kind: ClusterRoleBinding + name: documentdb-operator-cloudnative-pg + - group: "admissionregistration.k8s.io" + version: v1 + kind: MutatingWebhookConfiguration + name: cnpg-mutating-webhook-configuration + - group: "admissionregistration.k8s.io" + version: v1 + kind: ValidatingWebhookConfiguration + name: cnpg-validating-webhook-configuration + - group: "rbac.authorization.k8s.io" + version: v1 + kind: ClusterRole + name: wal-replica-manager + - group: "rbac.authorization.k8s.io" + version: v1 + kind: ClusterRoleBinding + name: wal-replica-manager-binding + policy: + placementType: PickAll + strategy: + type: RollingUpdate diff --git a/documentdb-playground/k3s-azure-fleet/documentdb-resource-crp.yaml b/documentdb-playground/k3s-azure-fleet/documentdb-resource-crp.yaml new file mode 100644 index 00000000..44e25b51 --- /dev/null +++ b/documentdb-playground/k3s-azure-fleet/documentdb-resource-crp.yaml @@ -0,0 +1,88 @@ +# Namespace definition with Istio injection enabled +apiVersion: v1 +kind: Namespace +metadata: + name: documentdb-preview-ns + labels: + istio-injection: enabled + +--- + +apiVersion: v1 +kind: Secret +metadata: + name: documentdb-credentials + namespace: documentdb-preview-ns +type: Opaque +stringData: + username: default_user + password: {{DOCUMENTDB_PASSWORD}} + +--- + +apiVersion: documentdb.io/preview +kind: DocumentDB +metadata: + name: documentdb-preview + namespace: documentdb-preview-ns +spec: + nodeCount: 1 + instancesPerNode: 1 + documentDBImage: ghcr.io/microsoft/documentdb/documentdb-local:16 + gatewayImage: ghcr.io/microsoft/documentdb/documentdb-local:16 + resource: + storage: + pvcSize: 10Gi + # Note: k3s clusters use 'aks' environment (only aks/eks/gke are supported) + environment: aks + clusterReplication: + highAvailability: true + # Use Istio for cross-cluster communication + crossCloudNetworkingStrategy: Istio + primary: {{PRIMARY_CLUSTER}} + clusterList: +{{CLUSTER_LIST}} + exposeViaService: + serviceType: LoadBalancer + logLevel: info + +--- + +apiVersion: placement.kubernetes-fleet.io/v1beta1 +kind: ClusterResourcePlacement +metadata: + name: documentdb-namespace-crp +spec: + resourceSelectors: + - group: "" + version: v1 + kind: Namespace + name: documentdb-preview-ns + selectionScope: NamespaceOnly + policy: + placementType: PickAll + strategy: + type: RollingUpdate + +--- + +# ResourcePlacement for DocumentDB resources within the namespace +apiVersion: placement.kubernetes-fleet.io/v1beta1 +kind: ResourcePlacement +metadata: + name: documentdb-resource-rp + namespace: documentdb-preview-ns +spec: + resourceSelectors: + - group: documentdb.io + kind: DocumentDB + version: preview + name: documentdb-preview + - group: "" + version: v1 + kind: Secret + name: documentdb-credentials + policy: + placementType: PickAll + strategy: + type: RollingUpdate diff --git a/documentdb-playground/k3s-azure-fleet/install-cert-manager.sh b/documentdb-playground/k3s-azure-fleet/install-cert-manager.sh new file mode 100755 index 00000000..e4f4ec0e --- /dev/null +++ b/documentdb-playground/k3s-azure-fleet/install-cert-manager.sh @@ -0,0 +1,106 @@ +#!/usr/bin/env bash +set -euo pipefail + +# Install cert-manager on all clusters (AKS hub via kubectl, k3s via kubectl context) + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" + +# Load deployment info +if [ -f "$SCRIPT_DIR/.deployment-info" ]; then + source "$SCRIPT_DIR/.deployment-info" +else + echo "Error: Deployment info not found. Run deploy-infrastructure.sh first." + exit 1 +fi + +CERT_MANAGER_VERSION="${CERT_MANAGER_VERSION:-v1.14.4}" +HUB_CLUSTER_NAME="${HUB_CLUSTER_NAME:-hub-${HUB_REGION}}" + +echo "=======================================" +echo "cert-manager Installation" +echo "=======================================" +echo "Version: $CERT_MANAGER_VERSION" +echo "Hub Cluster: $HUB_CLUSTER_NAME" +echo "=======================================" + +# Get all member clusters +ALL_MEMBERS="$HUB_CLUSTER_NAME" + +# Add k3s clusters from deployment info +IFS=' ' read -ra K3S_REGION_ARRAY <<< "${K3S_REGIONS:-}" +for region in "${K3S_REGION_ARRAY[@]}"; do + if kubectl config get-contexts "k3s-$region" &>/dev/null; then + ALL_MEMBERS="$ALL_MEMBERS k3s-$region" + fi +done + +echo "Installing on: $ALL_MEMBERS" + +# Add Jetstack Helm repo +echo "" +echo "Adding Jetstack Helm repository..." +helm repo add jetstack https://charts.jetstack.io --force-update +helm repo update + +# Install cert-manager on each member cluster +for cluster in $ALL_MEMBERS; do + echo "" + echo "=======================================" + echo "Installing cert-manager on $cluster" + echo "=======================================" + + kubectl config use-context "$cluster" + + # Check if already installed + if helm list -n cert-manager 2>/dev/null | grep -q cert-manager; then + echo "cert-manager already installed on $cluster, upgrading..." + HELM_CMD="upgrade" + else + HELM_CMD="install" + fi + + # Apply CRDs explicitly (helm crds.enabled can fail silently) + echo "Applying cert-manager CRDs..." + kubectl apply -f "https://github.com/cert-manager/cert-manager/releases/download/${CERT_MANAGER_VERSION}/cert-manager.crds.yaml" + + # Install/upgrade cert-manager + helm $HELM_CMD cert-manager jetstack/cert-manager \ + --namespace cert-manager \ + --create-namespace \ + --version "$CERT_MANAGER_VERSION" \ + --set crds.enabled=true \ + --set prometheus.enabled=false \ + --set webhook.timeoutSeconds=30 \ + --set startupapicheck.enabled=false \ + --wait --timeout 5m || echo "Warning: cert-manager may not be fully ready on $cluster" + + echo "✓ cert-manager installed on $cluster" +done + +# Apply ClusterResourcePlacement on hub for future clusters +echo "" +echo "Applying cert-manager ClusterResourcePlacement on hub..." +kubectl config use-context "$HUB_CLUSTER_NAME" +kubectl apply -f "$SCRIPT_DIR/cert-manager-crp.yaml" + +# Verify installation +echo "" +echo "=======================================" +echo "Verification" +echo "=======================================" + +for cluster in $ALL_MEMBERS; do + echo "" + echo "=== $cluster ===" + kubectl --context "$cluster" get pods -n cert-manager 2>/dev/null || echo " Pods not ready" +done + +echo "" +echo "=======================================" +echo "✅ cert-manager Installation Complete!" +echo "=======================================" +echo "" +echo "Next steps:" +echo " 1. ./install-documentdb-operator.sh" +echo " 2. ./deploy-documentdb.sh" +echo "=======================================" diff --git a/documentdb-playground/k3s-azure-fleet/install-documentdb-operator.sh b/documentdb-playground/k3s-azure-fleet/install-documentdb-operator.sh new file mode 100755 index 00000000..f1b4c105 --- /dev/null +++ b/documentdb-playground/k3s-azure-fleet/install-documentdb-operator.sh @@ -0,0 +1,215 @@ +#!/usr/bin/env bash +set -euo pipefail + +# Install DocumentDB operator on all clusters +# - AKS hub: installed via Helm (from published chart or local source) +# - k3s VMs: installed via Azure VM Run Command (CNPG from upstream, operator manifests via base64) +# +# Environment variables: +# BUILD_CHART - "true" builds from local source; "false" (default) uses published Helm chart +# CHART_VERSION - Chart version when using published chart (default: latest) +# VERSION - Local chart version number when BUILD_CHART=true (default: 200) +# VALUES_FILE - Optional Helm values file path + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" + +# Load deployment info +if [ -f "$SCRIPT_DIR/.deployment-info" ]; then + source "$SCRIPT_DIR/.deployment-info" +else + echo "Error: Deployment info not found. Run deploy-infrastructure.sh first." + exit 1 +fi + +CHART_DIR="$(cd "$SCRIPT_DIR/../.." && pwd)/operator/documentdb-helm-chart" +VERSION="${VERSION:-200}" +VALUES_FILE="${VALUES_FILE:-}" +BUILD_CHART="${BUILD_CHART:-false}" +HELM_REPO_URL="https://documentdb.github.io/documentdb-kubernetes-operator" +CHART_VERSION="${CHART_VERSION:-}" +HUB_CLUSTER_NAME="${HUB_CLUSTER_NAME:-hub-${HUB_REGION}}" + +echo "=======================================" +echo "DocumentDB Operator Installation" +echo "=======================================" +echo "Hub Cluster: $HUB_CLUSTER_NAME" +if [ "$BUILD_CHART" = "true" ]; then + echo "Chart Source: local ($CHART_DIR)" +else + echo "Chart Source: published (documentdb/documentdb-operator${CHART_VERSION:+ v$CHART_VERSION})" +fi +echo "=======================================" + +# Check prerequisites +for cmd in kubectl helm az base64 awk curl; do + if ! command -v "$cmd" &>/dev/null; then + echo "Error: Required command '$cmd' not found." + exit 1 + fi +done + +# ─── Step 1: Install on AKS hub via Helm ─── +echo "" +echo "=======================================" +echo "Step 1: Installing operator on AKS hub ($HUB_CLUSTER_NAME)" +echo "=======================================" + +kubectl config use-context "$HUB_CLUSTER_NAME" + +CHART_PKG="$SCRIPT_DIR/documentdb-operator-0.0.${VERSION}.tgz" + +if [ "$BUILD_CHART" = "true" ]; then + rm -f "$CHART_PKG" + echo "Packaging Helm chart from local source..." + helm dependency update "$CHART_DIR" + helm package "$CHART_DIR" --version "0.0.${VERSION}" --destination "$SCRIPT_DIR" + CHART_REF="$CHART_PKG" +else + echo "Using published Helm chart..." + helm repo add documentdb "$HELM_REPO_URL" --force-update 2>/dev/null + helm repo update documentdb + CHART_REF="documentdb/documentdb-operator" + if [ -n "$CHART_VERSION" ]; then + CHART_REF="$CHART_REF --version $CHART_VERSION" + fi + # Pull chart locally (needed for k3s manifest generation in Step 2) + rm -f "$SCRIPT_DIR"/documentdb-operator-*.tgz + helm pull documentdb/documentdb-operator ${CHART_VERSION:+--version "$CHART_VERSION"} --destination "$SCRIPT_DIR" + CHART_PKG=$(ls "$SCRIPT_DIR"/documentdb-operator-*.tgz 2>/dev/null | head -1) +fi + +echo "" +echo "Installing operator..." +HELM_ARGS=( + --namespace documentdb-operator + --create-namespace + --wait --timeout 10m +) +if [ -n "$VALUES_FILE" ] && [ -f "$VALUES_FILE" ]; then + HELM_ARGS+=(--values "$VALUES_FILE") +fi +# shellcheck disable=SC2086 +helm upgrade --install documentdb-operator $CHART_REF "${HELM_ARGS[@]}" +echo "✓ Operator installed on $HUB_CLUSTER_NAME" + +# ─── Step 2: Install on k3s clusters via Run Command ─── +echo "" +echo "=======================================" +echo "Step 2: Installing operator on k3s clusters via Run Command" +echo "=======================================" + +# Generate DocumentDB-specific manifests (excluding CNPG subchart) +echo "Generating DocumentDB operator manifests..." + +# k3s VMs need a local chart package for helm template +if [ ! -f "$CHART_PKG" ]; then + echo "Error: Chart package not found at $CHART_PKG" + exit 1 +fi + +DOCDB_MANIFESTS=$(mktemp) + +# Add documentdb-operator namespace +cat > "$DOCDB_MANIFESTS" << 'NSEOF' +--- +apiVersion: v1 +kind: Namespace +metadata: + name: documentdb-operator +NSEOF + +# Extract DocumentDB-specific templates (non-CNPG) +helm template documentdb-operator "$CHART_PKG" \ + --namespace documentdb-operator \ + --include-crds 2>/dev/null | \ + awk ' + /^# Source: documentdb-operator\/crds\/documentdb\.io/{p=1} + /^# Source: documentdb-operator\/templates\//{p=1} + /^# Source: documentdb-operator\/charts\//{p=0} + p + ' >> "$DOCDB_MANIFESTS" + +MANIFEST_B64=$(base64 < "$DOCDB_MANIFESTS") +MANIFEST_SIZE=$(wc -c < "$DOCDB_MANIFESTS" | tr -d ' ') +rm -f "$DOCDB_MANIFESTS" + +if [ "$MANIFEST_SIZE" -lt 100 ]; then + echo "Error: Generated manifest is too small (${MANIFEST_SIZE} bytes) — Helm template may have failed." + exit 1 +fi + +echo "Manifest size: $(echo "$MANIFEST_B64" | wc -c | tr -d ' ') bytes (base64), ${MANIFEST_SIZE} bytes (raw)" + +IFS=' ' read -ra K3S_REGION_ARRAY <<< "${K3S_REGIONS:-}" +for region in "${K3S_REGION_ARRAY[@]}"; do + VM_NAME="k3s-$region" + echo "" + echo "--- Installing on $VM_NAME ---" + + # Step 2a: Ensure Helm is installed + echo " Ensuring Helm is available..." + az vm run-command invoke -g "$RESOURCE_GROUP" -n "$VM_NAME" --command-id RunShellScript \ + --scripts 'which helm || (curl -fsSL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash)' \ + --query 'value[0].message' -o tsv 2>/dev/null | awk '/^\[stdout\]/{flag=1; next} /^\[stderr\]/{flag=0} flag' + + # Step 2b: Install CNPG from upstream release manifest + echo " Installing CloudNative-PG..." + az vm run-command invoke -g "$RESOURCE_GROUP" -n "$VM_NAME" --command-id RunShellScript \ + --scripts ' +export KUBECONFIG=/etc/rancher/k3s/k3s.yaml +kubectl apply --server-side -f https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/main/releases/cnpg-1.27.1.yaml 2>&1 | tail -3 +echo "Waiting for CNPG..." +kubectl -n cnpg-system rollout status deployment/cnpg-controller-manager --timeout=120s 2>&1 || true +echo "CNPG ready" +' \ + --query 'value[0].message' -o tsv 2>/dev/null | awk '/^\[stdout\]/{flag=1; next} /^\[stderr\]/{flag=0} flag' + + # Step 2c: Apply DocumentDB operator manifests + echo " Applying DocumentDB operator manifests..." + az vm run-command invoke -g "$RESOURCE_GROUP" -n "$VM_NAME" --command-id RunShellScript \ + --scripts " +export KUBECONFIG=/etc/rancher/k3s/k3s.yaml +echo '${MANIFEST_B64}' | base64 -d > /tmp/docdb-manifests.yaml +kubectl apply --server-side -f /tmp/docdb-manifests.yaml 2>&1 | tail -5 +rm -f /tmp/docdb-manifests.yaml +echo 'Waiting for operator...' +kubectl -n documentdb-operator rollout status deployment/documentdb-operator --timeout=120s 2>&1 || true +echo 'Done' +" \ + --query 'value[0].message' -o tsv 2>/dev/null | awk '/^\[stdout\]/{flag=1; next} /^\[stderr\]/{flag=0} flag' + + echo " ✓ Operator installed on $VM_NAME" +done + +# ─── Step 3: Verify ─── +echo "" +echo "=======================================" +echo "Verification" +echo "=======================================" + +echo "" +echo "=== $HUB_CLUSTER_NAME ===" +kubectl --context "$HUB_CLUSTER_NAME" get pods -n documentdb-operator -o wide 2>/dev/null || echo " No pods" +kubectl --context "$HUB_CLUSTER_NAME" get pods -n cnpg-system -o wide 2>/dev/null || echo " No pods" + +for region in "${K3S_REGION_ARRAY[@]}"; do + VM_NAME="k3s-$region" + echo "" + echo "=== $VM_NAME ===" + az vm run-command invoke -g "$RESOURCE_GROUP" -n "$VM_NAME" --command-id RunShellScript \ + --scripts ' +export KUBECONFIG=/etc/rancher/k3s/k3s.yaml +kubectl get pods -n documentdb-operator +kubectl get pods -n cnpg-system +' \ + --query 'value[0].message' -o tsv 2>/dev/null | awk '/^\[stdout\]/{flag=1; next} /^\[stderr\]/{flag=0} flag' +done + +echo "" +echo "=======================================" +echo "✅ DocumentDB Operator Installation Complete!" +echo "=======================================" +echo "" +echo "Next step:" +echo " ./deploy-documentdb.sh" +echo "=======================================" diff --git a/documentdb-playground/k3s-azure-fleet/install-istio.sh b/documentdb-playground/k3s-azure-fleet/install-istio.sh new file mode 100755 index 00000000..21d6f693 --- /dev/null +++ b/documentdb-playground/k3s-azure-fleet/install-istio.sh @@ -0,0 +1,492 @@ +#!/bin/bash +set -e + +# ================================ +# Install Istio Service Mesh across all clusters +# ================================ +# - AKS hub: installed via istioctl (standard approach) +# - k3s VMs: installed via Helm + istioctl (for east-west gateway) +# +# Uses multi-primary, multi-network mesh configuration +# with shared root CA for cross-cluster mTLS trust. + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +ISTIO_VERSION="${ISTIO_VERSION:-1.24.0}" + +# Load deployment info +if [ -f "${SCRIPT_DIR}/.deployment-info" ]; then + source "${SCRIPT_DIR}/.deployment-info" +else + echo "Error: .deployment-info not found. Run deploy-infrastructure.sh first." + exit 1 +fi + +# Build cluster list +ALL_CLUSTERS=("hub-${HUB_REGION}") +IFS=' ' read -ra K3S_REGION_ARRAY <<< "$K3S_REGIONS" +IFS=' ' read -ra K3S_IP_ARRAY <<< "$K3S_PUBLIC_IPS" +for region in "${K3S_REGION_ARRAY[@]}"; do + ALL_CLUSTERS+=("k3s-${region}") +done + +echo "=======================================" +echo "Istio Service Mesh Installation" +echo "=======================================" +echo "Version: $ISTIO_VERSION" +echo "Clusters: ${ALL_CLUSTERS[*]}" +echo "=======================================" +echo "" + +# Check prerequisites +for cmd in kubectl helm make openssl curl; do + if ! command -v "$cmd" &>/dev/null; then + echo "Error: Required command '$cmd' not found." + exit 1 + fi +done + +# Download istioctl if not present +if ! command -v istioctl &> /dev/null; then + echo "Installing istioctl..." + curl -L https://istio.io/downloadIstio | ISTIO_VERSION=${ISTIO_VERSION} sh - + export PATH="$PWD/istio-${ISTIO_VERSION}/bin:$PATH" + echo "✓ istioctl installed" +fi + +ISTIO_INSTALLED_VERSION=$(istioctl version --remote=false 2>/dev/null | head -1 || echo "unknown") +echo "Using istioctl: $ISTIO_INSTALLED_VERSION" + +# ─── Generate shared root CA ─── +CERT_DIR="${SCRIPT_DIR}/.istio-certs" +mkdir -p "$CERT_DIR" + +if [ ! -f "$CERT_DIR/root-cert.pem" ]; then + echo "" + echo "Generating shared root CA..." + pushd "$CERT_DIR" > /dev/null + if [ ! -d "istio-${ISTIO_VERSION}" ]; then + curl -sL "https://github.com/istio/istio/archive/refs/tags/${ISTIO_VERSION}.tar.gz" | tar xz + fi + make -f "istio-${ISTIO_VERSION}/tools/certs/Makefile.selfsigned.mk" root-ca + echo "✓ Root CA generated" + popd > /dev/null +fi + +# ─── Install Istio on each cluster ─── +for i in "${!ALL_CLUSTERS[@]}"; do + cluster="${ALL_CLUSTERS[$i]}" + network_id="network$((i + 1))" + + echo "" + echo "=======================================" + echo "Installing Istio on $cluster (${network_id})" + echo "=======================================" + + # Verify cluster access + if [[ "$cluster" != k3s-* ]]; then + if ! kubectl --context "$cluster" get nodes --request-timeout=10s &>/dev/null; then + echo "⚠ Cannot access $cluster via kubectl" + continue + fi + fi + + if [[ "$cluster" == k3s-* ]]; then + # ─── k3s clusters: namespace label + certs via Run Command ─── + region="${cluster#k3s-}" + vm_name="k3s-${region}" + + echo "Labeling istio-system namespace on $vm_name..." + az vm run-command invoke \ + --resource-group "$RESOURCE_GROUP" \ + --name "$vm_name" \ + --command-id RunShellScript \ + --scripts "k3s kubectl create namespace istio-system --dry-run=client -o yaml | k3s kubectl apply -f - && k3s kubectl label namespace istio-system topology.istio.io/network=${network_id} --overwrite && echo NS_LABELED" \ + --query 'value[0].message' -o tsv 2>/dev/null | tail -1 + else + # AKS: direct kubectl + kubectl --context "$cluster" create namespace istio-system --dry-run=client -o yaml | \ + kubectl --context "$cluster" apply -f - 2>/dev/null || true + kubectl --context "$cluster" label namespace istio-system topology.istio.io/network="${network_id}" --overwrite 2>/dev/null || true + fi + + # Generate and apply cluster-specific certificates + echo "Generating certificates for $cluster..." + pushd "$CERT_DIR" > /dev/null + make -f "istio-${ISTIO_VERSION}/tools/certs/Makefile.selfsigned.mk" "${cluster}-cacerts" + popd > /dev/null + + if [[ "$cluster" == k3s-* ]]; then + # k3s certs are pre-injected via cloud-init during VM deployment. + # Verify the cacerts secret exists on the VM. + region="${cluster#k3s-}" + vm_name="k3s-${region}" + echo "Verifying pre-injected cacerts secret on $vm_name..." + result=$(az vm run-command invoke \ + --resource-group "$RESOURCE_GROUP" \ + --name "$vm_name" \ + --command-id RunShellScript \ + --scripts 'k3s kubectl get secret cacerts -n istio-system -o name 2>/dev/null && echo CERTS_OK || echo CERTS_MISSING' \ + --query 'value[0].message' -o tsv 2>/dev/null || echo "") + + if echo "$result" | grep -q "CERTS_OK"; then + echo "✓ Cacerts secret verified (pre-injected via cloud-init)" + else + echo "⚠ Cacerts secret not found — applying via Run Command..." + # Fallback: create from locally-generated certs via Run Command + ROOT_CERT_CONTENT=$(cat "${CERT_DIR}/${cluster}/root-cert.pem") + CA_CERT_CONTENT=$(cat "${CERT_DIR}/${cluster}/ca-cert.pem") + CA_KEY_CONTENT=$(cat "${CERT_DIR}/${cluster}/ca-key.pem") + CERT_CHAIN_CONTENT=$(cat "${CERT_DIR}/${cluster}/cert-chain.pem") + az vm run-command invoke \ + --resource-group "$RESOURCE_GROUP" \ + --name "$vm_name" \ + --command-id RunShellScript \ + --scripts " +k3s kubectl create namespace istio-system --dry-run=client -o yaml | k3s kubectl apply -f - +cat > /tmp/root-cert.pem <<'CERTEOF' +${ROOT_CERT_CONTENT} +CERTEOF +cat > /tmp/ca-cert.pem <<'CERTEOF' +${CA_CERT_CONTENT} +CERTEOF +cat > /tmp/ca-key.pem <<'CERTEOF' +${CA_KEY_CONTENT} +CERTEOF +cat > /tmp/cert-chain.pem <<'CERTEOF' +${CERT_CHAIN_CONTENT} +CERTEOF +k3s kubectl create secret generic cacerts -n istio-system \ + --from-file=ca-cert.pem=/tmp/ca-cert.pem \ + --from-file=ca-key.pem=/tmp/ca-key.pem \ + --from-file=root-cert.pem=/tmp/root-cert.pem \ + --from-file=cert-chain.pem=/tmp/cert-chain.pem \ + --dry-run=client -o yaml | k3s kubectl apply -f - +echo CERTS_APPLIED" \ + --query 'value[0].message' -o tsv 2>/dev/null || echo " ⚠ Failed to apply certs via Run Command" + fi + else + # AKS: apply certs via kubectl (direct access works) + kubectl --context "$cluster" create secret generic cacerts -n istio-system \ + --from-file="${CERT_DIR}/${cluster}/ca-cert.pem" \ + --from-file="${CERT_DIR}/${cluster}/ca-key.pem" \ + --from-file="${CERT_DIR}/${cluster}/root-cert.pem" \ + --from-file="${CERT_DIR}/${cluster}/cert-chain.pem" \ + --dry-run=client -o yaml | kubectl --context "$cluster" apply -f - 2>/dev/null || true + fi + echo "✓ Certificates configured" + + if [[ "$cluster" == k3s-* ]]; then + # ─── k3s clusters: install via Helm through Run Command ─── + region="${cluster#k3s-}" + vm_name="k3s-${region}" + + # Look up public IP for this region + public_ip="" + for idx in "${!K3S_REGION_ARRAY[@]}"; do + if [ "${K3S_REGION_ARRAY[$idx]}" = "$region" ]; then + public_ip="${K3S_IP_ARRAY[$idx]}" + break + fi + done + + echo "Installing Istio via Helm on $vm_name (Run Command)..." + + # Step 1: Install istio-base via Helm + echo " Installing istio-base..." + az vm run-command invoke \ + --resource-group "$RESOURCE_GROUP" \ + --name "$vm_name" \ + --command-id RunShellScript \ + --scripts " +export KUBECONFIG=/etc/rancher/k3s/k3s.yaml +helm repo add istio https://istio-release.storage.googleapis.com/charts +helm repo update istio +helm upgrade --install istio-base istio/base \ + --namespace istio-system \ + --version ${ISTIO_VERSION} \ + --skip-schema-validation \ + --wait --timeout 2m && echo ISTIO_BASE_OK || echo ISTIO_BASE_FAIL" \ + --query 'value[0].message' -o tsv 2>/dev/null | tail -3 + + # Step 2: Install istiod via Helm + echo " Installing istiod..." + az vm run-command invoke \ + --resource-group "$RESOURCE_GROUP" \ + --name "$vm_name" \ + --command-id RunShellScript \ + --scripts " +export KUBECONFIG=/etc/rancher/k3s/k3s.yaml +helm repo add istio https://istio-release.storage.googleapis.com/charts +helm upgrade --install istiod istio/istiod \ + --namespace istio-system \ + --version ${ISTIO_VERSION} \ + --set global.meshID=mesh1 \ + --set global.multiCluster.clusterName=${cluster} \ + --set global.network=${network_id} \ + --set pilot.autoscaleEnabled=false \ + --set pilot.replicaCount=1 \ + --set meshConfig.defaultConfig.proxyMetadata.ISTIO_META_DNS_CAPTURE=true \ + --set meshConfig.defaultConfig.proxyMetadata.ISTIO_META_DNS_AUTO_ALLOCATE=true \ + --wait --timeout 5m && echo ISTIOD_OK || echo ISTIOD_FAIL" \ + --query 'value[0].message' -o tsv 2>/dev/null | tail -3 + + echo "✓ Istio control plane installed via Helm" + + # Step 3: Install east-west gateway via Helm (use values file for dot-containing labels) + echo " Installing east-west gateway..." + az vm run-command invoke \ + --resource-group "$RESOURCE_GROUP" \ + --name "$vm_name" \ + --command-id RunShellScript \ + --scripts " +export KUBECONFIG=/etc/rancher/k3s/k3s.yaml +cat > /tmp/eastwest-values.yaml <<'VALEOF' +labels: + istio: eastwestgateway + app: istio-eastwestgateway + topology.istio.io/network: ${network_id} +env: + ISTIO_META_ROUTER_MODE: sni-dnat + ISTIO_META_REQUESTED_NETWORK_VIEW: ${network_id} +service: + ports: + - name: status-port + port: 15021 + targetPort: 15021 + - name: tls + port: 15443 + targetPort: 15443 + - name: tls-istiod + port: 15012 + targetPort: 15012 + - name: tls-webhook + port: 15017 + targetPort: 15017 +VALEOF +helm repo add istio https://istio-release.storage.googleapis.com/charts +helm upgrade --install istio-eastwestgateway istio/gateway \ + -n istio-system \ + --version ${ISTIO_VERSION} \ + -f /tmp/eastwest-values.yaml \ + --skip-schema-validation \ + --wait --timeout 5m && echo EW_GW_OK || echo EW_GW_FAIL" \ + --query 'value[0].message' -o tsv 2>/dev/null | tail -3 + + # Step 4: Patch east-west gateway with public IP + apply Gateway resource + if [ -n "$public_ip" ]; then + echo " Patching east-west gateway with public IP: $public_ip" + az vm run-command invoke \ + --resource-group "$RESOURCE_GROUP" \ + --name "$vm_name" \ + --command-id RunShellScript \ + --scripts " +k3s kubectl patch svc istio-eastwestgateway -n istio-system \ + --type=json -p='[{\"op\": \"add\", \"path\": \"/spec/externalIPs\", \"value\": [\"${public_ip}\"]}]' +cat <<'GWEOF' | k3s kubectl apply -n istio-system -f - +apiVersion: networking.istio.io/v1beta1 +kind: Gateway +metadata: + name: cross-network-gateway +spec: + selector: + istio: eastwestgateway + servers: + - port: + number: 15443 + name: tls + protocol: TLS + tls: + mode: AUTO_PASSTHROUGH + hosts: + - '*.local' +GWEOF +echo GW_PATCHED" \ + --query 'value[0].message' -o tsv 2>/dev/null | tail -3 + fi + + echo "✓ East-west gateway installed on $vm_name" + else + # ─── AKS hub: use istioctl (standard approach) ─── + echo "Installing Istio via istioctl..." + cat </dev/null || echo "") + if [ -n "$GATEWAY_IP" ]; then + echo "✓ Gateway IP: $GATEWAY_IP" + break + fi + sleep 10 + done + [ -z "$GATEWAY_IP" ] && echo "⚠ Gateway IP not yet assigned" + fi +done + +# ─── Create remote secrets ─── +# Remote secrets allow each cluster's Istio to discover services on other clusters. +# For k3s clusters, we use Run Command since direct kubectl may not work. +echo "" +echo "=======================================" +echo "Creating remote secrets for cross-cluster discovery" +echo "=======================================" + +# Helper: apply a secret YAML to a target cluster (handles k3s via Run Command) +apply_secret_to_target() { + local target="$1" + local secret_yaml="$2" + + if [[ "$target" == k3s-* ]]; then + local region="${target#k3s-}" + local vm_name="k3s-${region}" + # Escape the YAML for embedding in Run Command script + az vm run-command invoke \ + --resource-group "$RESOURCE_GROUP" \ + --name "$vm_name" \ + --command-id RunShellScript \ + --scripts "cat <<'SECRETEOF' | k3s kubectl apply -f - +${secret_yaml} +SECRETEOF +echo SECRET_APPLIED" \ + --query 'value[0].message' -o tsv 2>/dev/null | tail -1 + else + echo "$secret_yaml" | kubectl --context "$target" apply -f - 2>/dev/null + fi +} + +for source_cluster in "${ALL_CLUSTERS[@]}"; do + if [[ "$source_cluster" == k3s-* ]]; then + # k3s source: read the pre-built remote secret from the VM + # (auto-generated during cloud-init — see main.bicep runcmd) + source_region="${source_cluster#k3s-}" + source_vm="k3s-${source_region}" + + echo "Reading pre-built remote secret from $source_vm..." + RAW_OUTPUT=$(az vm run-command invoke \ + --resource-group "$RESOURCE_GROUP" \ + --name "$source_vm" \ + --command-id RunShellScript \ + --scripts "cat /etc/istio-remote/remote-secret.yaml 2>/dev/null || echo REMOTE_SECRET_NOT_FOUND" \ + --query 'value[0].message' -o tsv 2>/dev/null || echo "") + SECRET_YAML=$(echo "$RAW_OUTPUT" | awk '/^\[stdout\]/{flag=1; next} /^\[stderr\]/{flag=0} flag') + + if [ -z "$SECRET_YAML" ] || ! echo "$SECRET_YAML" | grep -q "apiVersion"; then + echo " ⚠ Remote secret not ready on $source_vm, skipping" + echo " (Cloud-init may still be running. Re-run this script to retry.)" + continue + fi + + for target_cluster in "${ALL_CLUSTERS[@]}"; do + if [ "$source_cluster" != "$target_cluster" ]; then + echo " Applying: $source_cluster -> $target_cluster" + apply_secret_to_target "$target_cluster" "$SECRET_YAML" + fi + done + else + # AKS source: use istioctl (direct access works) + for target_cluster in "${ALL_CLUSTERS[@]}"; do + if [ "$source_cluster" != "$target_cluster" ]; then + echo "Creating secret: $source_cluster -> $target_cluster" + SECRET_YAML=$(istioctl create-remote-secret --context="$source_cluster" --name="$source_cluster" 2>/dev/null || echo "") + if [ -n "$SECRET_YAML" ]; then + apply_secret_to_target "$target_cluster" "$SECRET_YAML" + else + echo " ⚠ Could not create remote secret for $source_cluster" + fi + fi + done + fi +done + +echo "✓ Remote secrets configured" + +# ─── Verify ─── +echo "" +echo "=======================================" +echo "Verifying Istio Installation" +echo "=======================================" + +for cluster in "${ALL_CLUSTERS[@]}"; do + echo "" + echo "=== $cluster ===" + if [[ "$cluster" == k3s-* ]]; then + region="${cluster#k3s-}" + vm_name="k3s-${region}" + az vm run-command invoke \ + --resource-group "$RESOURCE_GROUP" \ + --name "$vm_name" \ + --command-id RunShellScript \ + --scripts "k3s kubectl get pods -n istio-system -o wide 2>/dev/null | head -10; echo '---'; k3s kubectl get svc -n istio-system istio-eastwestgateway 2>/dev/null || echo 'Gateway not found'" \ + --query 'value[0].message' -o tsv 2>/dev/null | awk '/^\[stdout\]/{flag=1; next} /^\[stderr\]/{flag=0} flag' + else + kubectl --context "$cluster" get pods -n istio-system -o wide 2>/dev/null | head -10 || echo " Could not get pods" + kubectl --context "$cluster" get svc -n istio-system istio-eastwestgateway 2>/dev/null || echo " Gateway not found" + fi +done + +echo "" +echo "=======================================" +echo "✅ Istio Installation Complete!" +echo "=======================================" +echo "" +echo "Mesh: mesh1" +echo "Networks:" +for i in "${!ALL_CLUSTERS[@]}"; do + echo " - ${ALL_CLUSTERS[$i]}: network$((i + 1))" +done +echo "" +echo "Next steps:" +echo " 1. Setup Fleet: ./setup-fleet.sh" +echo " 2. Install cert-manager: ./install-cert-manager.sh" +echo " 3. Install DocumentDB operator: ./install-documentdb-operator.sh" +echo "" diff --git a/documentdb-playground/k3s-azure-fleet/main.bicep b/documentdb-playground/k3s-azure-fleet/main.bicep new file mode 100644 index 00000000..05f12a07 --- /dev/null +++ b/documentdb-playground/k3s-azure-fleet/main.bicep @@ -0,0 +1,606 @@ +// k3s on Azure VMs with AKS Hub - Istio for cross-cluster networking +// No VNet peering required - Istio handles all cross-cluster traffic +// Uses Azure VM Run Command for all VM operations (no SSH required) + +@description('Location for AKS hub cluster') +param hubLocation string = 'westus3' + +@description('Regions for k3s VMs') +param k3sRegions array = ['eastus2', 'uksouth'] + +@description('Resource group name') +param resourceGroupName string = resourceGroup().name + +@description('VM size for k3s nodes') +param vmSize string = 'Standard_D2s_v3' + +@description('AKS node VM size') +param aksVmSize string = 'Standard_DS2_v2' + +@description('SSH public key for VM access (required by Azure but not used - we use Run Command)') +param sshPublicKey string + +@description('Admin username for VMs') +param adminUsername string = 'azureuser' + +@description('Kubernetes version for AKS (empty string uses region default)') +param kubernetesVersion string = '' + +@description('k3s version') +param k3sVersion string = 'v1.30.4+k3s1' + +@description('Allowed source IP for Kube API (port 6443) access. WARNING: Default \'*\' opens the Kubernetes API to the public internet. For production, restrict to your IP/CIDR (e.g., \'203.0.113.0/24\').') +param allowedSourceIP string = '*' + +@description('Per-cluster Istio CA certificates (base64-encoded PEM). Array of objects with rootCert, caCert, caKey, certChain.') +param istioCerts array = [] + +// Optionally include kubernetesVersion in cluster properties +var maybeK8sVersion = empty(kubernetesVersion) ? {} : { kubernetesVersion: kubernetesVersion } + +// Variables +var aksClusterName = 'hub-${hubLocation}' +var aksVnetName = 'aks-${hubLocation}-vnet' +var aksSubnetName = 'aks-subnet' + +// ================================ +// AKS Hub Cluster VNet + NSG +// ================================ +resource aksNsg 'Microsoft.Network/networkSecurityGroups@2023-05-01' = { + name: 'aks-${hubLocation}-nsg' + location: hubLocation + properties: { + securityRules: [ + { + name: 'AllowKubeAPI' + properties: { + priority: 100 + direction: 'Inbound' + access: 'Allow' + protocol: 'Tcp' + sourceAddressPrefix: allowedSourceIP + sourcePortRange: '*' + destinationAddressPrefix: '*' + destinationPortRange: '443' + description: 'Kubernetes API server access' + } + } + { + name: 'AllowHTTP' + properties: { + priority: 105 + direction: 'Inbound' + access: 'Allow' + protocol: 'Tcp' + sourceAddressPrefix: '*' + sourcePortRange: '*' + destinationAddressPrefix: '*' + destinationPortRange: '80' + description: 'HTTP ingress traffic' + } + } + { + name: 'AllowIstioEastWest' + properties: { + priority: 110 + direction: 'Inbound' + access: 'Allow' + protocol: 'Tcp' + sourceAddressPrefix: '*' + sourcePortRange: '*' + destinationAddressPrefix: '*' + destinationPortRange: '15443' + description: 'Istio east-west gateway for cross-cluster mTLS traffic' + } + } + { + name: 'AllowIstioStatus' + properties: { + priority: 120 + direction: 'Inbound' + access: 'Allow' + protocol: 'Tcp' + sourceAddressPrefix: '*' + sourcePortRange: '*' + destinationAddressPrefix: '*' + destinationPortRange: '15021' + description: 'Istio health check / status port' + } + } + { + name: 'AllowIstioControlPlane' + properties: { + priority: 130 + direction: 'Inbound' + access: 'Allow' + protocol: 'Tcp' + sourceAddressPrefix: '*' + sourcePortRange: '*' + destinationAddressPrefix: '*' + destinationPortRange: '15012' + description: 'Istio xDS (secure gRPC) for cross-cluster discovery' + } + } + { + name: 'AllowIstioWebhook' + properties: { + priority: 131 + direction: 'Inbound' + access: 'Allow' + protocol: 'Tcp' + sourceAddressPrefix: '*' + sourcePortRange: '*' + destinationAddressPrefix: '*' + destinationPortRange: '15017' + description: 'Istio webhook for sidecar injection' + } + } + { + name: 'AllowIstioGRPC' + properties: { + priority: 132 + direction: 'Inbound' + access: 'Allow' + protocol: 'Tcp' + sourceAddressPrefix: '*' + sourcePortRange: '*' + destinationAddressPrefix: '*' + destinationPortRange: '15010' + description: 'Istio xDS (plaintext gRPC) for proxy config distribution' + } + } + ] + } +} + +resource aksVnet 'Microsoft.Network/virtualNetworks@2023-05-01' = { + name: aksVnetName + location: hubLocation + properties: { + addressSpace: { + addressPrefixes: ['10.1.0.0/16'] + } + subnets: [ + { + name: aksSubnetName + properties: { + addressPrefix: '10.1.0.0/20' + networkSecurityGroup: { + id: aksNsg.id + } + } + } + ] + } +} + +// ================================ +// AKS Hub Cluster +// ================================ +resource aksCluster 'Microsoft.ContainerService/managedClusters@2024-01-01' = { + name: aksClusterName + location: hubLocation + identity: { + type: 'SystemAssigned' + } + properties: union({ + dnsPrefix: aksClusterName + enableRBAC: true + networkProfile: { + networkPlugin: 'azure' + networkPolicy: 'azure' + serviceCidr: '10.100.0.0/16' + dnsServiceIP: '10.100.0.10' + } + agentPoolProfiles: [ + { + name: 'nodepool1' + count: 2 + vmSize: aksVmSize + mode: 'System' + osType: 'Linux' + vnetSubnetID: resourceId('Microsoft.Network/virtualNetworks/subnets', aksVnetName, aksSubnetName) + enableAutoScaling: false + } + ] + aadProfile: { + managed: true + enableAzureRBAC: true + } + }, maybeK8sVersion) + dependsOn: [ + aksVnet + ] +} + +// ================================ +// k3s VMs - one per region +// ================================ + +// k3s VNets — subnet references the NSG so Azure won't auto-create NRMS NSGs +resource k3sVnets 'Microsoft.Network/virtualNetworks@2023-05-01' = [for (region, i) in k3sRegions: { + name: 'k3s-${region}-vnet' + location: region + properties: { + addressSpace: { + addressPrefixes: ['10.${i + 2}.0.0/16'] + } + subnets: [ + { + name: 'k3s-subnet' + properties: { + addressPrefix: '10.${i + 2}.0.0/24' + networkSecurityGroup: { + id: k3sNsgs[i].id + } + } + } + ] + } + dependsOn: [ + k3sNsgs[i] + ] +}] + +// Network Security Groups for k3s VMs +// Attached to both NIC and subnet to prevent Azure from auto-creating NRMS NSGs +resource k3sNsgs 'Microsoft.Network/networkSecurityGroups@2023-05-01' = [for (region, i) in k3sRegions: { + name: 'k3s-${region}-nsg' + location: region + properties: { + securityRules: [ + { + name: 'AllowSSH' + properties: { + priority: 100 + direction: 'Inbound' + access: 'Allow' + protocol: 'Tcp' + sourceAddressPrefix: allowedSourceIP + sourcePortRange: '*' + destinationAddressPrefix: '*' + destinationPortRange: '22' + } + } + { + name: 'AllowKubeAPI' + properties: { + priority: 110 + direction: 'Inbound' + access: 'Allow' + protocol: 'Tcp' + sourceAddressPrefix: allowedSourceIP + sourcePortRange: '*' + destinationAddressPrefix: '*' + destinationPortRange: '6443' + } + } + { + name: 'AllowIstioEastWest' + properties: { + priority: 120 + direction: 'Inbound' + access: 'Allow' + protocol: 'Tcp' + sourceAddressPrefix: '*' + sourcePortRange: '*' + destinationAddressPrefix: '*' + destinationPortRange: '15443' + } + } + { + name: 'AllowIstioControlPlane' + properties: { + priority: 130 + direction: 'Inbound' + access: 'Allow' + protocol: 'Tcp' + sourceAddressPrefix: '*' + sourcePortRange: '*' + destinationAddressPrefix: '*' + destinationPortRange: '15012' + description: 'Istio control plane (istiod) for cross-cluster discovery' + } + } + { + name: 'AllowIstioWebhook' + properties: { + priority: 131 + direction: 'Inbound' + access: 'Allow' + protocol: 'Tcp' + sourceAddressPrefix: '*' + sourcePortRange: '*' + destinationAddressPrefix: '*' + destinationPortRange: '15017' + description: 'Istio webhook port for sidecar injection' + } + } + { + name: 'AllowIstioGRPC' + properties: { + priority: 132 + direction: 'Inbound' + access: 'Allow' + protocol: 'Tcp' + sourceAddressPrefix: '*' + sourcePortRange: '*' + destinationAddressPrefix: '*' + destinationPortRange: '15010' + description: 'Istio xDS (plaintext gRPC) for proxy config distribution' + } + } + { + name: 'AllowIstioStatus' + properties: { + priority: 140 + direction: 'Inbound' + access: 'Allow' + protocol: 'Tcp' + sourceAddressPrefix: '*' + sourcePortRange: '*' + destinationAddressPrefix: '*' + destinationPortRange: '15021' + } + } + { + name: 'AllowHTTP' + properties: { + priority: 150 + direction: 'Inbound' + access: 'Allow' + protocol: 'Tcp' + sourceAddressPrefix: '*' + sourcePortRange: '*' + destinationAddressPrefix: '*' + destinationPortRange: '80' + } + } + { + name: 'AllowHTTPS' + properties: { + priority: 160 + direction: 'Inbound' + access: 'Allow' + protocol: 'Tcp' + sourceAddressPrefix: '*' + sourcePortRange: '*' + destinationAddressPrefix: '*' + destinationPortRange: '443' + } + } + ] + } +}] + +// Public IPs for k3s VMs +resource k3sPublicIps 'Microsoft.Network/publicIPAddresses@2023-05-01' = [for (region, i) in k3sRegions: { + name: 'k3s-${region}-ip' + location: region + sku: { + name: 'Standard' + } + properties: { + publicIPAllocationMethod: 'Static' + dnsSettings: { + domainNameLabel: 'k3s-${region}-${uniqueString(resourceGroup().id)}' + } + } +}] + +// NICs for k3s VMs +resource k3sNics 'Microsoft.Network/networkInterfaces@2023-05-01' = [for (region, i) in k3sRegions: { + name: 'k3s-${region}-nic' + location: region + properties: { + ipConfigurations: [ + { + name: 'ipconfig1' + properties: { + subnet: { + id: k3sVnets[i].properties.subnets[0].id + } + privateIPAllocationMethod: 'Dynamic' + publicIPAddress: { + id: k3sPublicIps[i].id + } + } + } + ] + networkSecurityGroup: { + id: k3sNsgs[i].id + } + } +}] + +// k3s VMs with cloud-init +resource k3sVms 'Microsoft.Compute/virtualMachines@2023-07-01' = [for (region, i) in k3sRegions: { + name: 'k3s-${region}' + location: region + properties: { + hardwareProfile: { + vmSize: vmSize + } + osProfile: { + computerName: 'k3s-${region}' + adminUsername: adminUsername + linuxConfiguration: { + disablePasswordAuthentication: true + ssh: { + publicKeys: [ + { + path: '/home/${adminUsername}/.ssh/authorized_keys' + keyData: sshPublicKey + } + ] + } + } + customData: base64(format('''#cloud-config +package_update: true +package_upgrade: true + +packages: + - curl + - jq +{2} +runcmd: + # All setup in one block so shell variables persist across commands. + # IMDS does not expose the VM public IP; use ifconfig.me instead. + - | + PUBLIC_IP=$(curl -s --retry 5 --retry-delay 3 ifconfig.me) + PRIVATE_IP=$(hostname -I | awk '{{print $1}}') + echo "PUBLIC_IP=$PUBLIC_IP PRIVATE_IP=$PRIVATE_IP" + mkdir -p /etc/rancher/k3s + cat > /etc/rancher/k3s/config.yaml </dev/null; do sleep 5; done + k3s kubectl create namespace istio-system --dry-run=client -o yaml | k3s kubectl apply -f - + k3s kubectl create secret generic cacerts -n istio-system \ + --from-file=ca-cert.pem=/etc/istio-certs/ca-cert.pem \ + --from-file=ca-key.pem=/etc/istio-certs/ca-key.pem \ + --from-file=root-cert.pem=/etc/istio-certs/root-cert.pem \ + --from-file=cert-chain.pem=/etc/istio-certs/cert-chain.pem \ + --dry-run=client -o yaml | k3s kubectl apply -f - + echo "Istio cacerts secret created successfully" + else + echo "No Istio certs found at /etc/istio-certs/, skipping cacerts secret" + fi + # Build Istio remote-secret YAML for cross-cluster discovery (auto-generated at boot). + # install-istio.sh reads this file instead of doing multi-step token extraction via Run Command. + - | + CLUSTER_NAME=$(hostname) + PUBLIC_IP=$(curl -s --retry 5 --retry-delay 3 ifconfig.me) + k3s kubectl create namespace istio-system --dry-run=client -o yaml | k3s kubectl apply -f - 2>/dev/null + echo "Setting up Istio remote access service account on $CLUSTER_NAME..." + k3s kubectl apply -f - </dev/null || true + k3s kubectl apply -f - </dev/null) + CA=$(k3s kubectl get secret istio-remote-reader-token -n istio-system -o jsonpath='{{.data.ca\.crt}}' 2>/dev/null) + TOKEN_DECODED=$(echo "$TOKEN" | base64 -d) + if [ -n "$TOKEN" ] && [ -n "$CA" ] && [ -n "$PUBLIC_IP" ]; then + mkdir -p /etc/istio-remote + cat > /etc/istio-remote/remote-secret.yaml < i ? format(''' +write_files: + - path: /etc/istio-certs/root-cert.pem + permissions: '0644' + encoding: b64 + content: {0} + - path: /etc/istio-certs/ca-cert.pem + permissions: '0644' + encoding: b64 + content: {1} + - path: /etc/istio-certs/ca-key.pem + permissions: '0600' + encoding: b64 + content: {2} + - path: /etc/istio-certs/cert-chain.pem + permissions: '0644' + encoding: b64 + content: {3} +''', istioCerts[i].rootCert, istioCerts[i].caCert, istioCerts[i].caKey, istioCerts[i].certChain) : '')) + } + storageProfile: { + imageReference: { + publisher: 'Canonical' + offer: '0001-com-ubuntu-server-jammy' + sku: '22_04-lts-gen2' + version: 'latest' + } + osDisk: { + createOption: 'FromImage' + managedDisk: { + storageAccountType: 'Premium_LRS' + } + diskSizeGB: 64 + } + } + networkProfile: { + networkInterfaces: [ + { + id: k3sNics[i].id + } + ] + } + } +}] + +// ================================ +// Outputs +// ================================ +output aksClusterName string = aksCluster.name +output aksClusterResourceGroup string = resourceGroupName +output k3sVmNames array = [for (region, i) in k3sRegions: k3sVms[i].name] +output k3sVmPublicIps array = [for (region, i) in k3sRegions: k3sPublicIps[i].properties.ipAddress] +output k3sRegions array = k3sRegions +output hubRegion string = hubLocation diff --git a/documentdb-playground/k3s-azure-fleet/parameters.bicepparam b/documentdb-playground/k3s-azure-fleet/parameters.bicepparam new file mode 100644 index 00000000..6b65a2ba --- /dev/null +++ b/documentdb-playground/k3s-azure-fleet/parameters.bicepparam @@ -0,0 +1,19 @@ +using './main.bicep' + +param hubLocation = 'westus3' + +param k3sRegions = [ + 'eastus2' + 'uksouth' +] + +param aksVmSize = 'Standard_DS2_v2' + +param vmSize = 'Standard_D2s_v3' + +// SSH key will be provided at deployment time +param sshPublicKey = '' + +param adminUsername = 'azureuser' + +param k3sVersion = 'v1.30.4+k3s1' diff --git a/documentdb-playground/k3s-azure-fleet/setup-fleet.sh b/documentdb-playground/k3s-azure-fleet/setup-fleet.sh new file mode 100755 index 00000000..7614ded8 --- /dev/null +++ b/documentdb-playground/k3s-azure-fleet/setup-fleet.sh @@ -0,0 +1,135 @@ +#!/usr/bin/env bash +set -euo pipefail + +# Setup KubeFleet hub and join all member clusters (AKS and k3s) + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" + +# Load deployment info +if [ -f "$SCRIPT_DIR/.deployment-info" ]; then + source "$SCRIPT_DIR/.deployment-info" +else + echo "Error: Deployment info not found. Run deploy-infrastructure.sh first." + exit 1 +fi + +RESOURCE_GROUP="${RESOURCE_GROUP:-documentdb-k3s-fleet-rg}" +HUB_REGION="${HUB_REGION:-westus3}" +HUB_CLUSTER_NAME="hub-${HUB_REGION}" + +echo "=======================================" +echo "KubeFleet Setup" +echo "=======================================" +echo "Resource Group: $RESOURCE_GROUP" +echo "Hub Cluster: $HUB_CLUSTER_NAME" +echo "=======================================" + +# Check prerequisites +for cmd in kubectl helm git jq curl; do + if ! command -v "$cmd" &>/dev/null; then + echo "Error: Required command '$cmd' not found." + exit 1 + fi +done + +# Get all member clusters (hub is also a member + k3s clusters) +ALL_MEMBERS=("$HUB_CLUSTER_NAME") + +# Add k3s clusters from deployment info +IFS=' ' read -ra K3S_REGION_ARRAY <<< "$K3S_REGIONS" +for region in "${K3S_REGION_ARRAY[@]}"; do + if kubectl config get-contexts "k3s-$region" &>/dev/null; then + ALL_MEMBERS+=("k3s-$region") + fi +done + +echo "Members to join: ${ALL_MEMBERS[*]}" + +# Clone KubeFleet repository +KUBFLEET_DIR=$(mktemp -d) +trap 'rm -rf "$KUBFLEET_DIR"' EXIT + +echo "" +echo "Cloning KubeFleet repository..." +if ! git clone --quiet https://github.com/kubefleet-dev/kubefleet.git "$KUBFLEET_DIR"; then + echo "ERROR: Failed to clone KubeFleet repository" + exit 1 +fi + +pushd "$KUBFLEET_DIR" > /dev/null + +# Get latest tag +FLEET_TAG=$(curl -s "https://api.github.com/repos/kubefleet-dev/kubefleet/tags" | jq -r '.[0].name') +echo "Using KubeFleet version: $FLEET_TAG" + +# Switch to hub context +kubectl config use-context "$HUB_CLUSTER_NAME" + +# Install hub-agent on the hub cluster +echo "" +echo "Installing KubeFleet hub-agent on $HUB_CLUSTER_NAME..." +export REGISTRY="ghcr.io/kubefleet-dev/kubefleet" +export TAG="$FLEET_TAG" + +helm upgrade --install hub-agent ./charts/hub-agent/ \ + --set image.pullPolicy=Always \ + --set image.repository=$REGISTRY/hub-agent \ + --set image.tag=$TAG \ + --set logVerbosity=5 \ + --set enableGuardRail=false \ + --set forceDeleteWaitTime="3m0s" \ + --set clusterUnhealthyThreshold="5m0s" \ + --set logFileMaxSize=100000 \ + --set MaxConcurrentClusterPlacement=200 \ + --set namespace=fleet-system-hub \ + --set enableWorkload=true \ + --wait + +echo "✓ Hub-agent installed" + +# Join member clusters using KubeFleet's script +# Known issue: joinMC.sh passes extra args to `kubectl config use-context`. +# If a member fails to join, see README troubleshooting for manual join steps. +echo "" +echo "Joining member clusters to fleet..." +chmod +x ./hack/membership/joinMC.sh +./hack/membership/joinMC.sh "$TAG" "$HUB_CLUSTER_NAME" "${ALL_MEMBERS[@]}" + +popd > /dev/null + +# Note: fleet-networking is NOT installed because Istio handles all cross-cluster +# networking (mTLS, service discovery, east-west traffic). Installing both would +# create conflicting network configurations. + +# Verify fleet status +echo "" +echo "=======================================" +echo "Fleet Status" +echo "=======================================" +kubectl config use-context "$HUB_CLUSTER_NAME" + +echo "" +echo "Member clusters:" +kubectl get membercluster 2>/dev/null || echo "No member clusters found yet (may take a moment)" + +echo "" +echo "Fleet system pods on hub:" +kubectl get pods -n fleet-system-hub 2>/dev/null || echo "Fleet system not ready" + +echo "" +echo "=======================================" +echo "✅ KubeFleet Setup Complete!" +echo "=======================================" +echo "" +echo "Hub: $HUB_CLUSTER_NAME" +echo "Members: ${ALL_MEMBERS[*]}" +echo "" +echo "Commands:" +echo " kubectl --context $HUB_CLUSTER_NAME get membercluster" +echo " kubectl --context $HUB_CLUSTER_NAME get clusterresourceplacement" +echo "" +echo "Next steps:" +echo " 1. ./install-cert-manager.sh" +echo " 2. ./install-documentdb-operator.sh" +echo " 3. ./deploy-documentdb.sh" +echo "=======================================" diff --git a/documentdb-playground/k3s-azure-fleet/test-connection.sh b/documentdb-playground/k3s-azure-fleet/test-connection.sh new file mode 100755 index 00000000..268cb392 --- /dev/null +++ b/documentdb-playground/k3s-azure-fleet/test-connection.sh @@ -0,0 +1,143 @@ +#!/usr/bin/env bash +set -euo pipefail + +# Test DocumentDB connectivity across all clusters + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" + +# Load deployment info +if [ -f "$SCRIPT_DIR/.deployment-info" ]; then + source "$SCRIPT_DIR/.deployment-info" +else + echo "Error: Deployment info not found. Run deploy-infrastructure.sh first." + exit 1 +fi + +echo "=======================================" +echo "DocumentDB Connectivity Test" +echo "=======================================" + +# Get all clusters +HUB_CLUSTER_NAME="${HUB_CLUSTER_NAME:-hub-${HUB_REGION}}" +ALL_CLUSTERS="$HUB_CLUSTER_NAME" + +IFS=' ' read -ra K3S_REGION_ARRAY <<< "${K3S_REGIONS:-}" +for region in "${K3S_REGION_ARRAY[@]}"; do + if kubectl config get-contexts "k3s-$region" &>/dev/null; then + ALL_CLUSTERS="$ALL_CLUSTERS k3s-$region" + fi +done + +CLUSTER_ARRAY=($ALL_CLUSTERS) + +echo "Testing ${#CLUSTER_ARRAY[@]} clusters..." +echo "" + +# Test each cluster +PASSED=0 +FAILED=0 + +for cluster in "${CLUSTER_ARRAY[@]}"; do + echo "=======================================" + echo "Testing: $cluster" + echo "=======================================" + + if ! kubectl config get-contexts "$cluster" &>/dev/null; then + echo " ✗ Context not found" + ((FAILED++)) + continue + fi + + # Check namespace + echo -n " Namespace: " + if kubectl --context "$cluster" get namespace documentdb-preview-ns &>/dev/null; then + echo "✓" + else + echo "✗ Not found" + ((FAILED++)) + continue + fi + + # Check DocumentDB resource + echo -n " DocumentDB resource: " + if kubectl --context "$cluster" get documentdb documentdb-preview -n documentdb-preview-ns &>/dev/null; then + STATUS=$(kubectl --context "$cluster" get documentdb documentdb-preview -n documentdb-preview-ns -o jsonpath='{.status.phase}' 2>/dev/null || echo "Unknown") + echo "✓ (Status: $STATUS)" + else + echo "✗ Not found" + ((FAILED++)) + continue + fi + + # Check pods + echo -n " Pods: " + PODS=$(kubectl --context "$cluster" get pods -n documentdb-preview-ns --no-headers 2>/dev/null | wc -l | tr -d ' ') + READY_PODS=$(kubectl --context "$cluster" get pods -n documentdb-preview-ns --no-headers 2>/dev/null | grep -c "Running" || echo "0") + echo "$READY_PODS/$PODS running" + + # Check service (try common naming patterns) + echo -n " Service: " + SVC_NAME="" + for name in "documentdb-preview" "documentdb-service-documentdb-preview"; do + if kubectl --context "$cluster" get svc "$name" -n documentdb-preview-ns &>/dev/null; then + SVC_NAME="$name" + break + fi + done + if [ -n "$SVC_NAME" ]; then + SVC_TYPE=$(kubectl --context "$cluster" get svc "$SVC_NAME" -n documentdb-preview-ns -o jsonpath='{.spec.type}') + if [ "$SVC_TYPE" = "LoadBalancer" ]; then + EXTERNAL_IP=$(kubectl --context "$cluster" get svc "$SVC_NAME" -n documentdb-preview-ns -o jsonpath='{.status.loadBalancer.ingress[0].ip}' 2>/dev/null || echo "") + if [ -n "$EXTERNAL_IP" ] && [ "$EXTERNAL_IP" != "" ]; then + echo "✓ ($SVC_TYPE: $EXTERNAL_IP)" + else + echo "✓ ($SVC_TYPE: IP pending)" + fi + else + echo "✓ ($SVC_TYPE)" + fi + else + echo "✗ Not found" + ((FAILED++)) + fi + + # Check secret + echo -n " Credentials secret: " + if kubectl --context "$cluster" get secret documentdb-credentials -n documentdb-preview-ns &>/dev/null; then + echo "✓" + else + echo "✗ Not found" + ((FAILED++)) + fi + + # Check operator + echo -n " Operator: " + OPERATOR_READY=$(kubectl --context "$cluster" get deploy documentdb-operator -n documentdb-operator -o jsonpath='{.status.readyReplicas}' 2>/dev/null || echo "0") + OPERATOR_DESIRED=$(kubectl --context "$cluster" get deploy documentdb-operator -n documentdb-operator -o jsonpath='{.spec.replicas}' 2>/dev/null || echo "0") + if [ "$OPERATOR_READY" = "$OPERATOR_DESIRED" ] && [ "$OPERATOR_READY" != "0" ]; then + echo "✓ ($OPERATOR_READY/$OPERATOR_DESIRED)" + ((PASSED++)) + else + echo "✗ ($OPERATOR_READY/$OPERATOR_DESIRED)" + ((FAILED++)) + fi + + echo "" +done + +# Summary +echo "=======================================" +echo "Summary" +echo "=======================================" +echo "Total clusters: ${#CLUSTER_ARRAY[@]}" +echo "Passed: $PASSED" +echo "Failed: $FAILED" +echo "" + +if [ $FAILED -eq 0 ]; then + echo "✅ All tests passed!" + exit 0 +else + echo "⚠️ Some tests failed. Check the output above." + exit 1 +fi