refactor(hostedcluster): segregate reconcile loop into error-collecting blocks by muraee · Pull Request #7908 · openshift/hypershift

muraee · 2026-03-10T16:10:27Z

Summary

Refactors reconcile() in the HostedCluster controller to use error-collecting blocks instead of sequential short-circuiting. Previously, any single failure among ~50 operations would block all subsequent work — e.g., a missing SSH key secret prevented CPO deployment and HCP creation.
Extracts 12 inline code blocks into named methods and introduces 5 wrapper methods (reconcileOperatorDeployments, reconcileRBACAndPolicies, reconcileKubeconfigAndPasswordSync, reconcileAuxiliary, reconcilePlatformSpecific) that collect errors independently.
Phases 6–8 now aggregate errors via utilerrors.NewAggregate(), ensuring unrelated failures don't cascade. Phase 0–5 (prerequisites, deletion, validation gates) retain short-circuiting since they are genuine dependencies.

Key changes

Phase	Behavior	Operations
0–5	Short-circuit (prerequisites)	HCP get, deletion, platform defaults, status, finalizers, namespace, platform
6	Error-collecting (`syncErrs`)	Platform creds, pull secret, SSH key, secret encryption, audit webhook, trust bundle, SA signing key, etcd MTLS, global config (11 operations)
7	Error-collecting (`coreErrs`)	HCP object → CAPI InfraCR → CAPI Cluster (sequential within, independent from Phase 6/8)
8	Error-collecting (`componentErrs`)	CPO/CAPI/Karpenter deployments, RBAC/network policies, kubeconfig/password sync, monitoring, platform-specific (5 independent wrapper groups)

Analysis

See docs/design/hostedcluster-reconcile-segregation-analysis.md for the full dependency analysis and operation map.

Test plan

All existing unit tests pass (go test -count=1 -race ./hypershift-operator/controllers/hostedcluster/)
make verify passes (codespell failure is pre-existing untracked files)
New unit tests for wrapper method isolation (TestReconcileKubeconfigAndPasswordSync_*, TestReconcileRBACAndPolicies_*)
New integration test (TestReconcile_WhenPhase6SSHKeyFails_ItShouldStillCreateHCPAndCPODeployment) runs the full reconcile() with a broken Phase 6, verifies HCP object (Phase 7) and CPO Deployment (Phase 8) are still created

🤖 Generated with Claude Code

Summary by CodeRabbit

Documentation
- Added comprehensive design documentation analyzing HostedCluster reconciliation flow, dependency chains, and operational bottlenecks.
Refactor
- Restructured reconciliation into modular phases enabling independent operations to continue despite partial failures; improved error aggregation and reporting.
Tests
- Added test cases validating that component failures don't block unrelated operations, ensuring robust partial-failure handling.

openshift-ci-robot · 2026-03-10T16:10:32Z

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

coderabbitai · 2026-03-10T16:11:04Z

Important

Review skipped

Auto reviews are limited based on label configuration.

🚫 Review skipped — only excluded labels are configured. (1)

do-not-merge/work-in-progress

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 77172037-4e72-4b98-b1bc-f259e5744919

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

This pull request introduces a phased, modular refactoring of the HostedCluster reconciliation loop. It adds a design document analyzing reconciliation segregation, restructures the controller into nine sequential phases with independent error aggregation, and introduces multiple helper functions to isolate functionality. A signature change removes the defaultIngressDomain parameter from reconcileControlPlaneOperator, and new tests validate partial progress when operations fail.

Changes

Cohort / File(s)	Summary
Design Documentation `docs/design/hostedcluster-reconcile-segregation-analysis.md`	New design document detailing reconciliation segregation analysis, including operation map splits (Pre-requisite, Part One, Part Two), dependency graphs, identified blocking issues, and impact assessment showing partial progress scenarios.
Controller Refactoring `hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go`	Major restructuring into 9 phases: initialization, pre-deletion propagation, deletion handling, conversion fixes, status updates, prerequisites, and three independent phase blocks. Introduces 15+ new modular helper functions (e.g., `reconcileCoreHCPChain`, `reconcileOperatorDeployments`, `reconcilePlatformCredentialsWithStatus`), changes `reconcileControlPlaneOperator` signature (removes `defaultIngressDomain` parameter), and implements aggregated error collection across independent syncs.
Test Coverage `hypershift-operator/controllers/hostedcluster/hostedcluster_controller_test.go`	Adds three new tests validating resilient reconciliation: kubeconfig sync failure with continued kubeadmin-password sync, RBAC failure with continued Prometheus RBAC creation, and Phase 6 SSH key failure with continued Phase 7–8 completion. Includes rbacv1 import for RBAC assertions.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality	⚠️ Warning	Three new tests lack context timeouts, have incomplete coverage for failure scenarios, and include inconsistent assertion messages.	Add context timeouts using context.WithTimeout(), mock dependencies to force failures in the PKI RBAC test, and add meaningful messages to all assertions.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main refactoring: segregating the reconcile loop into error-collecting blocks to prevent full short-circuiting on errors.
Stable And Deterministic Test Names	✅ Passed	Pull request uses Go's standard testing package (func TestXxx) rather than Ginkgo, so the Ginkgo test title stability check does not apply. Test function names are descriptive and static with no dynamic values.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

muraee · 2026-03-10T16:13:23Z

/test e2e-aws

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/design/hostedcluster-reconcile-segregation-analysis.md`:
- Around line 112-190: The markdown fenced block containing the ASCII diagram
(the block that begins with
"+-----------------------------------------------------+" and includes "CRITICAL
PREREQUISITES (must succeed first)") needs a language label to satisfy
markdownlint: change the opening fence from ``` to ```text so the diagram is
fenced as a text block; update the single fenced block in the file (the ASCII
diagram between the backticks) accordingly.

In `@hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go`:
- Around line 1726-1728: The code currently checks and then removes the
HostedClusterRestoredFromBackupAnnotation from hcluster before writing the
durable status (ReconciliationSucceeded/HostedClusterRestoredFromBackup
condition), which can lose the trigger if the status update fails; change the
flow so you do not consume/remove HostedClusterRestoredFromBackupAnnotation
until after the status write is confirmed: first set the
HostedClusterRestoredFromBackup condition on the HostedCluster status and
perform the status update (updateStatus on hcluster), retrying on conflict as
needed, and only after the status update succeeds remove the
HostedClusterRestoredFromBackupAnnotation (or perform the annotation removal in
a separate patch/update with proper conflict handling) so the reconcile will
retry if the status write failed.
- Around line 1456-1483: If reconcileCoreHCPChain failed and hcp is nil, phase‑8
helpers will dereference hcp and panic; guard the entire phase‑8 block by
checking if hcp == nil and, if so, append a recoverable error to componentErrs
(e.g. fmt.Errorf("skipping phase 8: HostedControlPlane is nil due to earlier
error")) and skip calling reconcileKubeconfigAndPasswordSync,
reconcileOperatorDeployments, reconcileRBACAndPolicies,
reconcilePlatformSpecific, and reconcileAuxiliary; otherwise run the existing
calls as before.
- Around line 2161-2174: The status fields holding secret references
(hcluster.Status.CustomKubeconfig and hcluster.Status.KubeadminPassword) are
only being cleared in memory; after deleting the Secrets you must also persist
those changes to the API by clearing the fields on the HostedCluster status and
calling the Status().Update (or Client.Status().Update) to save them. Modify the
branch that deletes the custom kubeconfig (and the other branch mentioned around
the KubeadminPassword) to set hcluster.Status.CustomKubeconfig = nil and/or
hcluster.Status.KubeadminPassword = nil as appropriate and then call
r.Status().Update(ctx, hcluster) (handling and returning any error) so the API
no longer holds dangling secret refs; use the existing DeleteIfNeeded flow and
ensure both branches behave the same way.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 33f6ef88-9d24-48cc-8609-666d2cca5d82

📥 Commits

Reviewing files that changed from the base of the PR and between cc479bc and 04ede4b.

📒 Files selected for processing (3)

docs/design/hostedcluster-reconcile-segregation-analysis.md
hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
hypershift-operator/controllers/hostedcluster/hostedcluster_controller_test.go

coderabbitai · 2026-03-10T16:24:43Z

docs/design/hostedcluster-reconcile-segregation-analysis.md

+```
+                  +-----------------------------------------------------+
+                  |  CRITICAL PREREQUISITES (must succeed first)         |
+                  |  P1: Get HCP                                        |
+                  |  S2: ICSP/IDMS registry                             |
+                  |  S3: Pull secret bytes                              |
+                  |  S4+S5: CPO image + labels                          |
+                  |  S27: Lookup release image -> S26: payload arch     |
+                  |  R22: Parse release image version                   |
+                  |  R10: Get platform                                  |
+                  +------------------------+----------------------------+
+                                           |
+            +------------------------------+------------------------------+
+            |                              |                              |
+            v                              v                              v
+  +-----------------+   +--------------------------+   +--------------------------+
+  | BLOCK A:        |   | BLOCK B:                 |   | BLOCK C:                 |
+  | Status Updates  |   | Namespace & Foundation   |   | Validation Gates         |
+  | (S1-S30)        |   | R1: Finalizer            |   | R7: Block on invalid     |
+  | Mostly reads,   |   | R2: CAPI finalizers      |   |     config/release       |
+  | independent     |   | R4: Cluster IDs          |   |                          |
+  | from each other |   | R5: CLI secrets          |   | (depends on S18,S19,S25) |
+  |                 |   | R6: AWS resource tags    |   |                          |
+  |                 |   | R9: CP namespace --------+-->|                          |
+  +-----------------+   +-----------+--------------+   +-----------+--------------+
+                                    |                              |
+                                    | (R9: CP namespace exists)    |
+                                    v                              |
+            +------------------------------------------+           |
+            | BLOCK D: Secret & ConfigMap Sync         |           |
+            | R11: Platform credentials                |           |
+            | R13: Pull secret sync                    |           |
+            | R14: Secret encryption                   |           |
+            | R15: Audit webhook                       |           |
+            | R16: SSH key                             |           |
+            | R17: AdditionalTrustBundle               |           |
+            | R18: SA signing key                      |           |
+            | R19: Unmanaged etcd MTLS                 |           |
+            | R21: Global config CMs/secrets           |           |
+            |                                          |           |
+            | *** ALL INDEPENDENT OF EACH OTHER ***    |           |
+            +---------------------+--------------------+           |
+                                  |                                |
+                                  v                                |
+            +------------------------------------------+           |
+            | BLOCK E: Core Control Plane              |<----------+
+            | R23: Reconcile HCP object                |
+            | R24: Reconcile CAPI Infra CR (-> R23)    |
+            | R25: Reconcile AWS subnets (-> R24)      |
+            | R27: Reconcile CAPI Cluster (-> R23,R24) |
+            |                                          |
+            | *** STRICTLY SEQUENTIAL ***              |
+            +---------------------+--------------------+
+                                  |
+            +--------------+------+--------+---------------+
+            v              v               v               v
+  +--------------+ +------------+ +-------------+ +--------------+
+  | BLOCK F:     | | BLOCK G:   | | BLOCK H:    | | BLOCK I:     |
+  | Operators    | | RBAC &     | | Kubeconfig  | | Platform-    |
+  |              | | Policies   | | & Secrets   | | specific     |
+  | R36: CPO     | | R26: Prom  | | R29: kube-  | | R41: OIDC/   |
+  | R37: CAPI    | |     RBAC   | |   config    | |   SecretProv |
+  |      manager | | R39: PKI   | | R30: custom | |   KubeVirt   |
+  | R38: CAPI    | |      RBAC  | |   kubeconf  | |              |
+  |      provider| | R40: Net   | | R31: admin  | |              |
+  | R42: Karpent | |      Pol.  | |   password  | |              |
+  +--------------+ +------------+ +-------------+ +--------------+
+
+  +-------------------------------------------------+
+  | BLOCK J: Auxiliary / Monitoring                  |
+  | R12: Restored-from-backup condition              |
+  | R20: ETCD member recovery                        |
+  | R28: Monitoring dashboard                        |
+  | R33: SRE metrics config                          |
+  | R34: OpenShift trusted CAs                       |
+  |                                                  |
+  | *** LARGELY INDEPENDENT ***                      |
+  +-------------------------------------------------+
+```


⚠️ Potential issue | 🟡 Minor

Add a language to this fenced block.

The diagram fence is currently unlabeled, which trips markdownlint and makes rendering/tooling less predictable. Use a language like text for this block.

🧰 Tools

🪛 markdownlint-cli2 (0.21.0)

[warning] 112-112: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/design/hostedcluster-reconcile-segregation-analysis.md` around lines 112 - 190, The markdown fenced block containing the ASCII diagram (the block that begins with "+-----------------------------------------------------+" and includes "CRITICAL PREREQUISITES (must succeed first)") needs a language label to satisfy markdownlint: change the opening fence from ``` to ```text so the diagram is fenced as a text block; update the single fenced block in the file (the ASCII diagram between the backticks) accordingly.

coderabbitai · 2026-03-10T16:24:43Z

hypershift-operator/controllers/hostedcluster/hostedcluster_controller_test.go

+	fakeClient := fake.NewClientBuilder().
+		WithScheme(api.Scheme).
+		WithObjects(hcluster, hcp).
+		Build()
+
+	r := &HostedClusterReconciler{
+		Client:                        fakeClient,
+		EnableOCPClusterMonitoring:    true,
+		ManagementClusterCapabilities: &fakecapabilities.FakeSupportAllCapabilities{},
+	}
+	createOrUpdate := upsert.New(false).CreateOrUpdate
+	log := zap.New(zap.UseDevMode(true), zap.Level(zapcore.InfoLevel))
+	releaseVersion := semver.MustParse("4.16.0")
+
+	_ = r.reconcileRBACAndPolicies(t.Context(), log, createOrUpdate, hcluster, hcp,
+		true, // controlPlanePKIOperatorSignsCSRs — enables PKI RBAC
+		false, releaseVersion)


⚠️ Potential issue | 🟠 Major

This test never actually exercises the PKI-RBAC failure path.

Nothing in this setup forces reconcilePKIOperatorRBAC to fail, and the collected errors are discarded, so the test still passes when every sub-reconcile succeeds. That means it doesn't prove the isolation behavior in the test name. Make the PKI RBAC branch fail deterministically and assert that failure is returned while the Prometheus Role/RoleBinding are still created.

hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go

coderabbitai · 2026-03-10T16:24:43Z

hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go

+	if _, exists := hcluster.Annotations[hyperv1.HostedClusterRestoredFromBackupAnnotation]; !exists {
+		return nil
 	}


⚠️ Potential issue | 🟠 Major

Don't consume the restore annotation before the condition is durable.

Line 1757 removes HostedClusterRestoredFromBackupAnnotation before the status write. If that update conflicts/fails and ReconciliationSucceeded is unchanged, the next reconcile exits at Line 1726 and never retries publishing HostedClusterRestoredFromBackup.

Also applies to: 1756-1770

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go` around lines 1726 - 1728, The code currently checks and then removes the HostedClusterRestoredFromBackupAnnotation from hcluster before writing the durable status (ReconciliationSucceeded/HostedClusterRestoredFromBackup condition), which can lose the trigger if the status update fails; change the flow so you do not consume/remove HostedClusterRestoredFromBackupAnnotation until after the status write is confirmed: first set the HostedClusterRestoredFromBackup condition on the HostedCluster status and perform the status update (updateStatus on hcluster), retrying on conflict as needed, and only after the status update succeeds remove the HostedClusterRestoredFromBackupAnnotation (or perform the annotation removal in a separate patch/update with proper conflict handling) so the reconcile will retry if the status write failed.

coderabbitai · 2026-03-10T16:24:43Z

hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go

+	} else {
+		// Delete the custom external kubeconfig secret if it exists and the external name is not set
+		if hcluster.Status.CustomKubeconfig != nil {
+			customKubeconfig := &corev1.Secret{
+				ObjectMeta: metav1.ObjectMeta{
+					Namespace: hcluster.Namespace,
+					Name:      hcluster.Status.CustomKubeconfig.Name,
+				},
 			}
+			if _, err := hyperutil.DeleteIfNeeded(ctx, r.Client, customKubeconfig); err != nil {
+				return fmt.Errorf("failed to delete custom external kubeconfig secret %q: %w", client.ObjectKeyFromObject(customKubeconfig), err)
+			}
+			hcluster.Status.CustomKubeconfig = nil
 		}


⚠️ Potential issue | 🟠 Major

Clear and persist the HostedCluster secret refs when these synced secrets are removed.

These branches run after phase 4 has already written status. CustomKubeconfig is only nilled in memory, and KubeadminPassword is never cleared at all, so the API can keep dangling references to deleted secrets indefinitely.

Also applies to: 2213-2225

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go` around lines 2161 - 2174, The status fields holding secret references (hcluster.Status.CustomKubeconfig and hcluster.Status.KubeadminPassword) are only being cleared in memory; after deleting the Secrets you must also persist those changes to the API by clearing the fields on the HostedCluster status and calling the Status().Update (or Client.Status().Update) to save them. Modify the branch that deletes the custom kubeconfig (and the other branch mentioned around the KubeadminPassword) to set hcluster.Status.CustomKubeconfig = nil and/or hcluster.Status.KubeadminPassword = nil as appropriate and then call r.Status().Update(ctx, hcluster) (handling and returning any error) so the API no longer holds dangling secret refs; use the existing DeleteIfNeeded flow and ensure both branches behave the same way.

openshift-ci · 2026-03-10T16:27:46Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: muraee

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [muraee]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

enxebre · 2026-03-12T12:23:36Z

docs/design/hostedcluster-reconcile-segregation-analysis.md

+- AWS resource tags error -> HCP never updated
+
+**After (proposed)**: Only genuinely dependent operations block each other. Examples:
+- SSH key secret missing -> SSH key not synced (reported via condition), but CPO still


As of today, I see ssh as dependency for the dataplane, as changing it would trigger a rollout, is this still the case?

no behavior change, this will only make other controlPlane operations that are not affected by the missing ssh-key proceed.

this should help decopuling data-plane/control-plane upgrades we are working on. Otherwise, control-plane upgrade will not proceed if ssh-key (data-plane input) is missing

enxebre · 2026-03-12T12:26:56Z

docs/design/hostedcluster-reconcile-segregation-analysis.md

+**After (proposed)**: Only genuinely dependent operations block each other. Examples:
+- SSH key secret missing -> SSH key not synced (reported via condition), but CPO still
+  deploys, HCP still reconciles, CAPI still works
+- Monitoring dashboard error -> reported, but karpenter and network policies proceed normally


can we include more examples and have test cases for all of them?

enxebre · 2026-03-12T12:27:49Z

was there jira bug we can ref reporting the scenario where this was being problematic for managed?

enxebre · 2026-03-12T12:30:46Z

docs/design/hostedcluster-reconcile-segregation-analysis.md

+## Key Problems Identified
+
+1. **Secret sync operations block each other** (R13-R21): A missing SSH key secret prevents
+   pull secret sync, secret encryption setup, ETCD MTLS, and global config sync. These are


why would we want to proceed with the CPO if the pull secret is invalid?

two reasons:

image could be already available in the cluster so an invalid pull-secret doesn't necessarily block the deployment

the imagePull error would be reported in the CPO deployment making it easier to discover than going through the HO logs

openshift-merge-robot · 2026-03-15T10:29:17Z

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

codecov · 2026-03-31T12:00:28Z

Codecov Report

❌ Patch coverage is 37.87611% with 351 lines in your changes missing coverage. Please review.
✅ Project coverage is 27.03%. Comparing base (7ce6015) to head (44fa50d).
⚠️ Report is 27 commits behind head on main.

Files with missing lines	Patch %	Lines
...trollers/hostedcluster/hostedcluster_controller.go	37.94%	306 Missing and 44 partials ⚠️
...rollers/hostedcluster/internal/platform/aws/aws.go	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #7908      +/-   ##
==========================================
+ Coverage   26.85%   27.03%   +0.17%     
==========================================
  Files        1090     1091       +1     
  Lines      105254   105800     +546     
==========================================
+ Hits        28263    28598     +335     
- Misses      74563    74767     +204     
- Partials     2428     2435       +7

Files with missing lines	Coverage Δ
...rollers/hostedcluster/internal/platform/aws/aws.go	`14.09% <0.00%> (ø)`
...trollers/hostedcluster/hostedcluster_controller.go	`47.46% <37.94%> (+4.48%)`	⬆️

... and 23 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…ng blocks The reconcile() method executes ~50 sequential operations where every error causes an early return, short-circuiting all remaining work. An unrelated failure (e.g., missing SSH key secret) prevents critical operations like deploying the CPO or reconciling the HCP object. This refactoring: - Extracts 12 inline blocks into named methods (reconcilePullSecretSync, reconcileSSHKeySync, reconcileSecretEncryptionSync, etc.) - Groups operations into phased error-collecting blocks: - Phase 6: Independent secret/config sync (syncErrs) - Phase 7: Core HCP chain (coreErrs) - Phase 8: Component groups via wrapper methods (componentErrs) - Introduces 5 wrapper methods returning []error: reconcileKubeconfigAndPasswordSync, reconcileOperatorDeployments, reconcileRBACAndPolicies, reconcileAuxiliary, reconcilePlatformSpecific - Aggregates all errors with utilerrors.NewAggregate at the end - Moves reconcileCLISecrets into reconcileAuxiliary (error-collecting) After this change, failures in one phase no longer block unrelated phases. For example, a missing SSH key no longer prevents CPO deployment or HCP object creation. Includes the analysis document and integration tests that verify non-blocking behavior across phases. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

openshift-ci · 2026-03-31T17:19:22Z

@muraee: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci bot added the do-not-merge/needs-area label Mar 10, 2026

coderabbitai bot reviewed Mar 10, 2026

View reviewed changes

openshift-ci bot requested review from enxebre and sjenning March 10, 2026 16:26

openshift-ci bot added the area/documentation Indicates the PR includes changes for documentation label Mar 10, 2026

openshift-ci bot added area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed do-not-merge/needs-area labels Mar 10, 2026

enxebre reviewed Mar 12, 2026

View reviewed changes

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 15, 2026

muraee force-pushed the refactor/hostedcluster-reconcile-error-collecting branch from 04ede4b to 168e67a Compare March 31, 2026 11:16

openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 31, 2026

muraee temporarily deployed to docs-preview March 31, 2026 11:22 — with GitHub Actions Inactive

muraee force-pushed the refactor/hostedcluster-reconcile-error-collecting branch from 168e67a to 15e6a0c Compare March 31, 2026 11:23

muraee temporarily deployed to docs-preview March 31, 2026 11:25 — with GitHub Actions Inactive

muraee force-pushed the refactor/hostedcluster-reconcile-error-collecting branch from 15e6a0c to e7bc83c Compare March 31, 2026 11:27

muraee temporarily deployed to docs-preview March 31, 2026 11:29 — with GitHub Actions Inactive

muraee force-pushed the refactor/hostedcluster-reconcile-error-collecting branch from e7bc83c to 5bc8ca5 Compare March 31, 2026 11:40

muraee temporarily deployed to docs-preview March 31, 2026 11:42 — with GitHub Actions Inactive

muraee force-pushed the refactor/hostedcluster-reconcile-error-collecting branch from 5bc8ca5 to 922ef75 Compare March 31, 2026 11:47

muraee temporarily deployed to docs-preview March 31, 2026 11:49 — with GitHub Actions Inactive

muraee force-pushed the refactor/hostedcluster-reconcile-error-collecting branch from 922ef75 to 84b3624 Compare March 31, 2026 13:42

openshift-ci bot added the area/platform/aws PR/issue for AWS (AWSPlatform) platform label Mar 31, 2026

muraee temporarily deployed to docs-preview March 31, 2026 13:44 — with GitHub Actions Inactive

muraee force-pushed the refactor/hostedcluster-reconcile-error-collecting branch from 84b3624 to 44fa50d Compare March 31, 2026 16:55

muraee deployed to docs-preview March 31, 2026 16:57 — with GitHub Actions View deployment

Conversation

muraee commented Mar 10, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key changes

Analysis

Test plan

Summary by CodeRabbit

Uh oh!

openshift-ci-robot commented Mar 10, 2026

Uh oh!

coderabbitai bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

muraee commented Mar 10, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

openshift-ci bot commented Mar 10, 2026

Uh oh!

enxebre Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

muraee Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

muraee Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

enxebre Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

muraee Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

enxebre commented Mar 12, 2026

Uh oh!

enxebre Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

muraee Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

openshift-merge-robot commented Mar 15, 2026

Uh oh!

codecov bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

openshift-ci bot commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

muraee commented Mar 10, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 10, 2026 •

edited

Loading

muraee Mar 12, 2026 •

edited

Loading

muraee Mar 12, 2026 •

edited

Loading

codecov bot commented Mar 31, 2026 •

edited

Loading