[TEST ONLY] feat: Enable ROSA cluster support in HCP backup/restore tests by mgencur · Pull Request #1 · mgencur/oadp-operator

mgencur · 2025-08-20T09:38:17Z

Add external-rosa mode for HC_BACKUP_RESTORE_MODE to support existing ROSA clusters
Introduce HC_NAMESPACE parameter for configurable cluster namespace management
Add service cluster kubeconfig support via SC_KUBECONFIG parameter for ROSA ManifestWork operations
Implement ManifestWork backup/deletion functionality for ROSA cluster lifecycle management
Add open-cluster-management.io/api dependency to support ManifestWork operations
Create separate OADP deployment operations for default vs ROSA scenarios
Skip DPA HCP plugin modification for ROSA where DPA is managed via ManifestWork
Add VSL_AWS_PROFILE parameter for volume snapshot location AWS profile configuration
Refactor backup/restore suite to use pluggable deployment strategies
Update test configuration to handle both regular HCP and ROSA cluster workflows

Why the changes were made

How to test the changes made

…penshift#1921) * Add HCP full backup/restore test suite for clusters with data plane This commit introduces a complete HCP (Hosted Control Plane) backup and restore testing framework with support for both newly created and existing HostedCluster environments. - Add `hcp_full_backup_restore_suite_test.go`: Complete test suite for full HCP backup/restore scenarios - Support for two operational modes: - `create`: Creates new HostedCluster for testing (existing behavior) - `existing`: Uses pre-existing HostedCluster with data plane - Add Makefile variables for HCP test configuration: - `HC_BACKUP_RESTORE_MODE`: Controls test execution mode (create/existing) - `HC_NAME`: Specifies HostedCluster name for existing mode - `HC_KUBECONFIG`: Path to guest cluster kubeconfig for existing mode - Pass HCP configuration parameters to e2e test execution - Refactor `runHCPBackupAndRestore()` function for unified handling of both modes - Add guest cluster verification functions (`PreBackupVerifyGuest`, `PostRestoreVerifyGuest`) - Separate log gathering and DPA resource cleanup into reusable functions - Enhanced error handling and validation for both control plane and guest cluster - Add support for kubeconfig-based guest cluster operations - Implement pre/post backup verification for guest cluster resources - Add namespace creation/validation tests for guest cluster functionality - Add `GetHostedCluster()` method to retrieve existing HostedCluster objects - Add `ClientGuest` field to `HCHandler` for guest cluster operations - Improve error message formatting in DPA helpers - Add comprehensive testing documentation for HCP scenarios - Include examples for running tests against existing HostedControlPlane - Document environment variable configuration options - Add conditional must-gather build based on `SKIP_MUST_GATHER` flag - Enhanced e2e test parameter passing for HCP configurations The implementation supports testing both scenarios where OADP needs to: 1. Create a new HostedCluster and test backup/restore (existing functionality) 2. Work with an existing HostedCluster that already has workloads and data plane This enables comprehensive testing of HCP backup/restore functionality in realistic production-like environments where clusters already exist and contain user workloads. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Add hcp label to full HCP tests * Fix panic by only constructing crClientForHC only when the hcKubeconfig is defined * Refactor HCP test configuration to use external cluster mode - Replace HC_BACKUP_RESTORE_MODE with TEST_HCP_EXTERNAL flag - Rename "existing" mode to "external" for clarity - Move HCP external test args to separate HCP_EXTERNAL_ARGS variable - Rename hcp_full_backup_restore_suite_test.go to hcp_external_cluster_backup_restore_suite_test.go - Update test labels from "hcp" to "hcp_external" for external cluster tests - Simplify Makefile by removing unused HC mode variables from main test-e2e target - Update documentation to reflect new external cluster test configuration * Refactor HCP test client initialization to use dynamic kubeconfig retrieval - Remove HC_KUBECONFIG flag and related global variables from test suite - Remove hardcoded crClientForHC global client initialization - Add GetHostedClusterKubeconfig() method to dynamically retrieve kubeconfig from HostedCluster status - Update pre/post backup verification to create client on-demand using retrieved kubeconfig - Clean up Makefile to remove HC_KUBECONFIG parameter handling - Simplify HCHandler by removing ClientGuest field This change improves test reliability by ensuring the guest cluster client is always created with the current kubeconfig rather than relying on potentially stale configuration passed via flags. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Wait for client to be ready after restore * Better errors messages when building kubeconfig --------- Co-authored-by: Claude <noreply@anthropic.com>

…kupStorageLocations (openshift#1930) - Add processCACertForBSLs() function to extract CA certificates from BSL configurations - Add processCACertificatesForVelero() function to mount CA certificates and set AWS_CA_BUNDLE environment variable - AWS_CA_BUNDLE triggers AWS SDK native CA certificate functionality for S3 operations - Support for both Velero and CloudStorage BSL configurations with custom CA certificates - Comprehensive unit tests for CA certificate processing logic - Tests migrated to Ginkgo BDD framework for better integration This enables imagestream backup operations and other S3-based operations to work correctly with custom CA certificates from BackupStorageLocation configurations, particularly in air-gapped environments with custom Certificate Authorities. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-authored-by: Claude <noreply@anthropic.com>

…hift#1941) - Add comprehensive performance testing guide in docs/performance_testing.md - Link to velero-performance-testing GitHub repository for toolkit access - Include OADP-specific testing guidance for Data Mover and CSI snapshots - Add performance testing section to main README table of contents - Provide resource requirements and performance expectations - Integrate with existing OADP documentation structure 🤖 Generated with [Claude Code](https://claude.ai/code) Co-authored-by: Claude <noreply@anthropic.com>

…enshift#1936) * Fix unnecessary secret updates and logging in STS flow The operator was repeatedly logging "Secret already exists, updating" and "Following standardized STS workflow, secret created successfully" even when the secret content hadn't changed. This was happening because the CloudStorage controller calls STSStandardizedFlow() on every reconciliation, which always attempted to create the secret first, then caught the AlreadyExists error and performed an update. Changed the approach to: - First check if the secret exists - Compare existing data with desired data - Only update when there are actual differences - Skip updates and avoid logging when content is identical - Changed CloudStorage controller to use Debug level and more accurate message when STS secret is available (not necessarily created) This eliminates unnecessary API calls to the Kubernetes cluster and reduces noise in the operator logs. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * refactor: Use constants for STS secret labels and error messages Replace hardcoded strings with constants from stsflow package: - Add constants for secret operation verbs (created, updated, unchanged) - Add constants for STS secret label key/value - Add constants for error messages - Update all files using "oadp.openshift.io/secret-type" to use STSSecretLabelKey - Update test files to use the new constants This improves maintainability and reduces risk of typos in label names and error messages across the codebase. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>

…penshift#1937) * Exponential Backoff for CloudStorage reconciler - Add Conditions field to CloudStorageStatus for better observability - Implement exponential backoff by returning errors on bucket operations - Controller-runtime automatically handles retries (5ms to 1000s max) - Add condition constants for type-safe reason strings - Create mock bucket client for improved testing - Add comprehensive tests for backoff behavior and conditions Key improvements: - Standard Kubernetes pattern using built-in workqueue backoff - Self-healing: continues retrying with increasing delays - Better observability through status conditions - Per-item backoff: each CloudStorage CR gets independent retry timing 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Add exponential backoff for CloudStorage status update failures (openshift#124) * Initial plan * Add exponential backoff for status update failures - Return error instead of just logging when final status update fails - Add documentation test explaining the change - Ensures controller-runtime's exponential backoff is triggered for status update failures Addresses PR comment openshift#1937 discussion_r2330918689 Co-authored-by: kaovilai <11228024+kaovilai@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: kaovilai <11228024+kaovilai@users.noreply.github.com> --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: kaovilai <11228024+kaovilai@users.noreply.github.com>

… BSL (openshift#1942) * feat: Use CloudStorage creationSecret as fallback for BSL credentials When a DataProtectionApplication references a CloudStorage CR without providing explicit credentials, the BSL controller now uses the CloudStorage's creationSecret as a fallback for authentication. Changes: - Enhanced BSL reconciliation to fallback to CloudStorage's creationSecret when DPA doesn't specify credentials - Moved fallback logic into centralized getSecretNameAndKeyFromCloudStorage function for better code organization - Updated validation to allow nil credentials when CloudStorage is referenced - Fixed related test cases to handle the new fallback behavior This allows users to avoid duplicating credential configuration between CloudStorage and DataProtectionApplication resources. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * feat: Add CloudStorage config and region fallback to BSL When a DataProtectionApplication references a CloudStorage CR, the BSL now inherits configuration values from the CloudStorage CR as fallback, similar to the credential fallback mechanism. Changes: - BSL now uses CloudStorage CR's Config field as base configuration - CloudStorage CR's Region field is automatically added to BSL config - DPA's CloudStorageLocation.Config values override CloudStorage values - Added comprehensive test coverage for config fallback behavior This enhancement allows users to define provider-specific settings once in the CloudStorage CR without needing to duplicate them in the DPA, while still maintaining the ability to override specific values at the DPA level when needed. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>

Adds 'oadp-qe-aws-sno & 'oadp-qe-azure'' to QE Test Runs table.. Generated by: Claude (AI Assistant)

…eam backups (openshift#1974) * docs: add CA Certificate Bundle documentation for ImageStream backups Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> * docs: clarify CA certificate handling for ImageStream backups in velero-plugin-for-aws Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> * docs: update configuration to include openshift plugin for ImageStream backups Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> * docs: enhance clarity and formatting in CA Certificate Bundle documentation for ImageStream backups Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> * docs: add detailed component relationship and flow for ImageStream backups Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> * docs: clarify the distinction between Velero BSL spec and S3 driver parameters for CA certificate handling Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> * docs: clarify distinction between Velero BSL spec and S3 driver parameters for CA certificate handling Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> * docs: enhance documentation on external BSLs for ImageStream backups and CA certificate collection process Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> --------- Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>

…penshift#1976) * Remove support for Restic in Data Protection Application and update related tests Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> * Refactor usage of pointer utilities to use the new ptr package and improve error messages in various controllers and tests Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> * `make bundle` Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> * Update CRDs from velero:main (99f12b8) Updated CRDs from Velero oadp-dev. Signed-off-by: Michal Pryc <mpryc@redhat.com> * UPSTREAM: <drop>: Updating go modules Signed-off-by: Michal Pryc <mpryc@redhat.com> * fix `go mod/vet ./...` && `make bundle` Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> * `make generate` Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> * Implement manual DeepCopy for NodeAgentConfigMapSettings and remove autogenerated version Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> * Use privileged fs-backup pods if fs-backup is enabled Signed-off-by: Michal Pryc <mpryc@redhat.com> Author: Scott Seago <sseago@redhat.com> * Add IfNotPresent for mongo image in the tests. Signed-off-by: Michal Pryc <mpryc@redhat.com> Author: Tiger Kaovilai <passawit.kaovilai@gmail.com> --------- Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> Signed-off-by: Michal Pryc <mpryc@redhat.com> Co-authored-by: Tiger Kaovilai <tkaovila@redhat.com>

* stub out rebase, cli status and cleanup Signed-off-by: Wesley Hayutin <weshayutin@gmail.com> * Update README.md Add 4.19 CLI periodic test badge --------- Signed-off-by: Wesley Hayutin <weshayutin@gmail.com> Co-authored-by: Joseph Antony Vaikath <jvaikath@redhat.com>

… BSLs and include system defaults (openshift#1972) * feat(bsl): concatenate all CA certificates from BSLs and include system defaults Instead of using "first one wins" approach, now collects and concatenates all unique CA certificates from BackupStorageLocations. Also includes system default CA certificates when custom certificates are present. Changes: - Modified processCACertForBSLs() to collect all unique CA certificates - Added deduplication logic to avoid including same certificate multiple times - Added getSystemCACertificates() helper to retrieve system CA bundles - System defaults are only included when custom CAs are present - Updated tests to verify concatenation and deduplication behavior This allows for more flexible multi-cloud/multi-endpoint configurations where different BSLs may require different CA certificates. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * feat(bsl): ensure BSL reconciliation preserves default field to avoid conflicts with Velero management Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> * feat(bsl): enhance CA certificate processing for multiple BSLs and add tests for validation Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> * PEM verify + `podman run -v `pwd`:`pwd` -w `pwd` quay.io/konveyor/builder:ubi9-v1.23 sh -c "make lint-fix"` Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> * refactor(nginx): reorganize deployment YAML structure to be compatible with e2e Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> * feat(e2e): add CA certificate handling for default e2e BSL in multiple test files Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> --------- Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> Co-authored-by: Tiger Kaovilai <tkaovila@redhat.com> Co-authored-by: Claude <noreply@anthropic.com>

* Improve documentation for custom plugin images usage Updated the documentation for custom plugin images in Velero, correcting formatting and providing clearer examples for the unsupportedOverrides field. * Update docs/config/custom_plugin_images.md * Update docs/config/custom_plugin_images.md

* DNM-TESTING updated mongo-todo * add some livliness and checks to deployments * remark out dc check * meh, remove hooks * remove hooks from all manifests * update to multiarch manifest and fix makefile * add retry for logs via cli * for kopia/restic we are not deleting pvc's explicitly * remove Mongo application KOPIA, too flaky

Adds the statuses and waves to the relevant README.md section. Signed-off-by: Michal Pryc <mpryc@redhat.com> Co-authored-by: Claude <noreply@anthropic.com>

…shift#1989) Add three standard Kubernetes labels to the openshift-adp-controller-manager deployment to enable more specific pod selection in NetworkPolicies: - app.kubernetes.io/name: oadp-operator - app.kubernetes.io/component: controller-manager - app.kubernetes.io/part-of: oadp-operator The current label 'control-plane: controller-manager' is too generic and could match unintended pods when used in NetworkPolicy selectors. These labels follow the recommended Kubernetes labeling conventions and match the existing standard used in config/prometheus/monitor.yaml. Fixes: openshift#1988 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <noreply@anthropic.com>

Signed-off-by: Joseph <jvaikath@redhat.com>

- Add comprehensive S3CERTIFY.md guide for S3-compatible storage certification - Include access requirements, credentials, endpoint configuration - Add support and communication guidelines - Provide certification timeline and process overview - Include quick checklist and contact information - Document applies to OADP 1.3+ and OpenShift 4.12+

This design presents a Kubernetes-native solution for recovering individual files from KubeVirt Virtual Machine backups created with OADP in OpenShift. Signed-off-by: Michal Pryc <mpryc@redhat.com>

…ift#1993) * parks-app-oadp * add the manifest.yaml * blah * fixes * all working * cleanup * fix curl for todolist * fix lint issue

…nto oadp-dev (openshift#1990) * UPSTREAM: <drop>: Updating go modules * UPSTREAM: <drop>: update Velero CRDs @ oadp-dev * UPSTREAM: <drop>: make bundle update --------- Co-authored-by: oadp-team-rebase-bot <oadp-maintainers@redhat.com>

Remove local must-gather directory and build process in favor of using the external quay.io/konveyor/oadp-must-gather:latest image via oc adm must-gather. This eliminates architecture mismatch issues and keeps must-gather code in its dedicated repository. Changes: - Updated RunMustGather() in tests/e2e/lib/apps.go to use oc adm must-gather - Added MUST_GATHER_IMAGE env var (defaults to quay.io/konveyor/oadp-must-gather:latest) - Removed build-must-gather target from Makefile - Removed entire must-gather/ directory (3,174 lines deleted) - Updated documentation in TESTING.md The SKIP_MUST_GATHER flag is preserved for skipping must-gather collection. Version-specific images can be used by setting MUST_GATHER_IMAGE env var. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Revert "refactor(e2e): migrate to external oadp-must-gather container image" This reverts commit 09a2a49. Revert "fix(e2e): derive must-gather directory pattern from image name" This reverts commit 2ae6d45. fix(e2e): update mongo image version and resource limits in mongo-persistent deployment fix(e2e): update mongo image version and resource limits in mongo-persistent deployment fix(e2e): derive must-gather directory pattern from image name The directory pattern was hardcoded to 'quay-io-konveyor-oadp-must-gather-*' which breaks when using custom images via MUST_GATHER_IMAGE env var. Now dynamically derives the pattern from the actual image name by replacing registry separators (. / :) with hyphens to match oc adm must-gather's directory naming convention. Examples: - quay.io/konveyor/oadp-must-gather:latest -> quay-io-konveyor-oadp-must-gather-latest-* - docker.io/myuser/custom:v1 -> docker-io-myuser-custom-v1-* 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <noreply@anthropic.com>

Co-authored-by: oadp-team-rebase-bot <oadp-maintainers@redhat.com>

- Add external-rosa mode for HC_BACKUP_RESTORE_MODE to support existing ROSA clusters - Introduce HC_NAMESPACE parameter for configurable cluster namespace management - Add service cluster kubeconfig support via SC_KUBECONFIG parameter for ROSA ManifestWork operations - Implement ManifestWork backup/deletion functionality for ROSA cluster lifecycle management - Add open-cluster-management.io/api dependency to support ManifestWork operations - Create separate OADP deployment operations for default vs ROSA scenarios - Skip DPA HCP plugin modification for ROSA where DPA is managed via ManifestWork - Add VSL_AWS_PROFILE parameter for volume snapshot location AWS profile configuration - Refactor backup/restore suite to use pluggable deployment strategies - Update test configuration to handle both regular HCP and ROSA cluster workflows 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

* Also guard against nil errors

mgencur changed the title ~~feat: Add ROSA support to HCP backup/restore testing~~ [TEST ONLY] feat: Add ROSA support to HCP backup/restore testing Aug 20, 2025

mgencur force-pushed the hcp_oadp_full_rosa branch 3 times, most recently from ca418fe to 835de49 Compare August 26, 2025 08:31

mgencur changed the title ~~[TEST ONLY] feat: Add ROSA support to HCP backup/restore testing~~ [TEST ONLY] feat: Enable ROSA cluster support in HCP backup/restore tests Aug 26, 2025

mgencur force-pushed the hcp_oadp_full_rosa branch 3 times, most recently from 2765acd to cfd509b Compare August 27, 2025 06:49

mgencur force-pushed the hcp_oadp_full branch from 94f5f0b to 0bb6641 Compare August 27, 2025 06:54

mgencur force-pushed the hcp_oadp_full_rosa branch from cfd509b to e7ce16e Compare August 27, 2025 07:22

mgencur force-pushed the hcp_oadp_full_rosa branch from 78ef121 to d8296b4 Compare September 3, 2025 06:54

kaovilai and others added 6 commits September 5, 2025 14:29

docs(QE_PROW): add sno/Azure jobs link and status (openshift#1953)

ab5b96f

Adds 'oadp-qe-aws-sno & 'oadp-qe-azure'' to QE Test Runs table.. Generated by: Claude (AI Assistant)

mgencur force-pushed the hcp_oadp_full_rosa branch from 3773521 to db120c9 Compare September 19, 2025 05:56

kaovilai and others added 11 commits October 2, 2025 14:22

Add Rebase status to the README.md (openshift#1991)

67d681c

Adds the statuses and waves to the relevant README.md section. Signed-off-by: Michal Pryc <mpryc@redhat.com> Co-authored-by: Claude <noreply@anthropic.com>

Add test status for oadp-cli OCP 4.20 (openshift#1994)

76382de

Signed-off-by: Joseph <jvaikath@redhat.com>

Virtual Machine File Restore design (openshift#1992)

5c1e63f

This design presents a Kubernetes-native solution for recovering individual files from KubeVirt Virtual Machine backups created with OADP in OpenShift. Signed-off-by: Michal Pryc <mpryc@redhat.com>

weshayutin and others added 5 commits October 21, 2025 16:07

Add imagestream, deploymentconfig, hooks app back to oadp-e2e (opensh…

44d4afc

…ift#1993) * parks-app-oadp * add the manifest.yaml * blah * fixes * all working * cleanup * fix curl for todolist * fix lint issue

updated partner information re: 1.5/1.6 (openshift#2000)

513180a

UPSTREAM: <drop>: Updating go modules (openshift#2015)

038659f

Co-authored-by: oadp-team-rebase-bot <oadp-maintainers@redhat.com>

mgencur force-pushed the hcp_oadp_full_rosa branch 2 times, most recently from aac86d1 to e31b5f7 Compare November 10, 2025 12:50

mgencur force-pushed the hcp_oadp_full_rosa branch from e31b5f7 to c9e9437 Compare November 11, 2025 04:37

Use the configured hcNamespace instead of default ClustersNamespace

6c07593

* Also guard against nil errors

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TEST ONLY] feat: Enable ROSA cluster support in HCP backup/restore tests#1

[TEST ONLY] feat: Enable ROSA cluster support in HCP backup/restore tests#1
mgencur wants to merge 25 commits intohcp_oadp_fullfrom
hcp_oadp_full_rosa

mgencur commented Aug 20, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Conversation

mgencur commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why the changes were made

How to test the changes made

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

mgencur commented Aug 20, 2025 •

edited

Loading