-
Notifications
You must be signed in to change notification settings - Fork 523
Description
Introduction
This proposal introduces a backup and restore capability for ClickHouse using a plugin-based architecture #1792
The solution is divided into two main parts:
-
Operator Part
- Extends the ClickHouse Operator with new Custom Resource Definitions (CRDs) for managing backups.
- Reconciles these resources and delegates execution to an external plugin.
- Handles scheduling, lifecycle management, and integration with the ClickHouseInstallation (CHI) object.
-
Plugin Part
- A standalone Go-based gRPC service that implements backup and restore logic.
- Receives serialized CRD definitions from the operator and performs actual backup operations.
- Provides well-defined APIs for Backup and Restore actions, returning status and metadata.
By separating orchestration (operator) from execution (plugin), this design ensures clean separation of concerns, easier extensibility, and the possibility for different backup implementations without modifying operator core code.
1. Operator Part
The operator will be extended to support Backup and Restore functionality via a plugin-based architecture. This is aligned with the ClickHouse Operator Plugin Interface (COP-I), which introduces modular gRPC-based extensions for auxiliary features.
New Custom Resources (CRDs)
Two new CRDs will be introduced:
-
ClickhouseBackup (CHB)
- Represents a single backup request.
- Defines scope (
dbTablewhitelist/blacklist), destination (e.g., S3), credentials, and metadata. - Operator responsibility:
- Serialize CHB into JSON and send to the backup plugin.
- Monitor backup status and update CR status (
running,completed,failed).
-
ClickhouseScheduledBackup (CHSB)
- Represents scheduled backups.
- Supports cron-like schedules (
schedulefield). - Options:
immediate(trigger immediately),suspend(pause). - Operator responsibility:
- Manage recurring backup triggers.
- Ensure backup CRs are created as per schedule.
- Route definitions to the plugin.
Operator Responsibilities
-
CR Lifecycle Management
Ensure CHB/CHSB resources are reconciled, status updated, and cleanup performed. -
Plugin Discovery
Detect backup plugin services via:altinity.com/pluginNamelabel (e.g.,clickhouse.backup.altinity.com)altinity.com/pluginPortannotation.
-
gRPC Invocation
Marshal CHI + Backup specs into JSON, invoke plugin APIs, and update CR status. -
Restore Flow
ExtendClickHouseInstallation(CHI) CR with abootstrap.recoverysection pointing to abackupRef.
2. Plugin Part
The backup plugin will be implemented as a gRPC service deployed independently from the operator.
The operator communicates with it using the defined protobuf contracts.
Exposed gRPC APIs
1. Backup API
BackupRequest
chi_definition: JSON of the target ClickHouseInstallationbackup_definition: JSON of the ClickhouseBackup / ClickhouseScheduledBackupparameters: Optional overrides (compression, retention policy, etc.)
BackupResult
backup_id,backup_namestarted_at,stopped_atmetadata: Plugin-specific info (S3 path, compression type, etc.)
2. Restore API
RestoreRequest
chi_definition: JSON of the cluster to restorebackup_definition: JSON of the ClickhouseBackup / ClickhouseScheduledBackup
RestoreResponse
restore_id,restore_namestarted_at,stopped_atmetadata: Additional info (restored tables, PITR info, etc.)
Features
-
Backup Types
- Cluster-wide or per-db/table (with whitelist/blacklist)
- Default: backup all except
systemschemas
-
Storage
- S3-compatible destinations
-
Scheduling
- Cron-based recurring backups
-
Restore
- Support bootstrap recovery from defined
backupRef
- Support bootstrap recovery from defined
Benefits
- Separation of Concerns: Operator focuses on orchestration, plugin handles backup mechanics
- Modularity: Backup logic can evolve independently
- Extensibility: Community or vendors can build custom backup plugins without forking operator code
- Consistency: Standard gRPC-based interface ensures compatibility