feat(clickhouse): enhance addon with HA, backup, security, ops and observability#2550
Open
realzyy wants to merge 1 commit intoapecloud:mainfrom
Open
feat(clickhouse): enhance addon with HA, backup, security, ops and observability#2550realzyy wants to merge 1 commit intoapecloud:mainfrom
realzyy wants to merge 1 commit intoapecloud:mainfrom
Conversation
…servability improvements High Availability: - Add Pod Anti-Affinity and TopologySpreadConstraints for CH Server and Keeper - Add PodDisruptionBudget for CH Server and Keeper - Add livenessProbe and startupProbe for CH Server - Expose nodeSelector and priorityClassName via values.yaml - Set podManagementPolicy to OrderedReady (make-before-break rolling update) Backup & Restore: - Fix TLS mode restore: remove hardcoded exit-1 block in incremental-restore.sh - Enable backup schedules by default with configurable cron via values.yaml - Decouple full (weekly) and incremental (daily Mon-Sat) backup schedules - Enable UNDROP TABLE support (allow_experimental_undrop_table_query) Security: - Add Transparent Data Encryption (TDE) config option (AES-256-GCM-SIV) - Tighten default IP allowlist to cluster-internal CIDRs only - Add readonly/monitoring/ingest user role profiles - Add rotate-password OpsDefinition with K8s Secret patching via in-cluster curl Operations: - Add vscale-check OpsDefinition (pre/post memory safety check) - Add pre-scale-in-shard OpsDefinition (data migration before shard removal) - Add diagnose OpsDefinition (replica sync, merge queue, Keeper health, slow queries) - Add RBAC for OpsDefinition workload pods to patch Secrets Observability: - Add PrometheusRule template for common ClickHouse alerts - Upgrade Grafana dashboard: Query Insights panels (slow queries, P99 latency) - Add Keeper health panels to Grafana dashboard - Add AsyncMetrics system resource panels (memory, disk, CPU, MergeTree bytes) Read-Write Separation: - Add clickhouse-readonly ComponentDefinition (readonly=2 profile, no DDL/DML) - Add standalone-with-readonly and cluster-with-readonly cluster topologies - Register readonly component in ComponentVersion with compatibility rules Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR brings the ClickHouse addon closer to feature parity with ClickHouse Cloud, covering 21 items across HA, backup/restore, security, operations, observability, and read-write separation.
High Availability
livenessProbe+startupProbefor CH Server (dead-lock recovery, slow-start protection)nodeSelectorandpriorityClassNameexposed viavalues.yamlpodManagementPolicy: OrderedReadyfor make-before-break rolling updatesBackup & Restore
exit 1inincremental-restore.shthat blocked TLS-mode restoresvalues.yamlUNDROP TABLEsupport (allow_experimental_undrop_table_query)Security
values.yamlreadonly,monitoring,ingestrotate-passwordOpsDefinition: updates ClickHouse SQL users + patches K8s Secret via in-cluster curlOperations (OpsDefinitions)
vscale-check: pre/post memory safety check before vertical scalingpre-scale-in-shard: migrates data off a shard before removal (dry-run supported)diagnose: cluster health report — replica sync, merge queue, Keeper status, slow queries Top-10kb-<cmpdName>service account Secret patch permission for password rotationObservability
Read-Write Separation
clickhouse-readonlyComponentDefinition withreadonly=2profile (SELECT allowed, DDL/DML blocked)standalone-with-readonly,cluster-with-readonlyTest plan
helm templaterenders without errors on all modified templatesdiagnose,vscale-check,rotate-password,pre-scale-in-shard) reachSucceedstatus in k3dREADONLY Code 164values.yamloverrides in rendered BackupPolicyTemplate🤖 Generated with Claude Code