SPLAT-2410: Introduce Cluster API for OpenShift Installations on vSphere#455
SPLAT-2410: Introduce Cluster API for OpenShift Installations on vSphere#455AnnaZivkovic wants to merge 5 commits intoopenshift:mainfrom
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@AnnaZivkovic: This pull request references SPLAT-2410 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Important Review skippedAuto reviews are limited based on label configuration. 🚫 Review skipped — only excluded labels are configured. (1)
Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository YAML (base), Organization UI (inherited) Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds vSphere platform support and feature‑gate driven startup: registers the VSphere API scheme; implements bidirectional MAPI↔CAPI vSphere converters, controller platform branches, v1beta2 condition helpers, and extensive unit + fuzz tests; feature gates now control vSphere controller startup. Changes
Sequence Diagram(s)sequenceDiagram
participant Admin as Administrator
participant CAPI as CAPI (VSphereMachine / Template)
participant MigCtrl as Migration Controller
participant C2M as CAPI→MAPI Converter
participant MAPI as MAPI (Machine / ProviderSpec)
participant Cluster as Target Cluster
Admin->>CAPI: Create/Modify VSphere CAPI resources
CAPI->>MigCtrl: Controller watch / reconcile
MigCtrl->>C2M: FromMachineAndVSphereMachineAndVSphereCluster(...)
C2M->>C2M: Parse CAPV spec (networks, disks, clone, workspace)
C2M->>MAPI: Emit MAPI Machine + ProviderSpec
MigCtrl->>Cluster: Apply MAPI Machine
Cluster->>Admin: Provisioning outcome
sequenceDiagram
participant Admin as Administrator
participant Cluster as MAPI (Machine with vSphere ProviderSpec)
participant SyncCtrl as Sync Controller
participant Parser as ProviderSpec Parser
participant M2C as MAPI→CAPI Converter
participant CAPI as CAPI (VSphereMachine / Template)
Admin->>Cluster: Create MAPI Machine with vSphere ProviderSpec
Cluster->>SyncCtrl: Controller reconcile
SyncCtrl->>Parser: ProviderSpecFromRawExtension(...)
Parser->>M2C: vSphere provider parsed
M2C->>M2C: Build VSphereMachine and Template, validate
M2C->>CAPI: Return CAPI Machine + Template
SyncCtrl->>CAPI: Create/Update CAPI resources
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
@AnnaZivkovic: This pull request references SPLAT-2410 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@pkg/controllers/machinesetsync/machineset_sync_controller.go`:
- Around line 395-400: The loop that appends pointers uses &template (range
variable) which causes pointer aliasing so every entry in outdatedTemplates
points to the same item; change the iteration to index-based access (e.g., for i
:= range list.Items { tmpl := &list.Items[i]; if tmpl.GetName() !=
newInfraMachineTemplateName { outdatedTemplates = append(outdatedTemplates,
tmpl) } }) for the VSphereMachineTemplateList case and make the identical change
in the AWS, IBMPowerVS, and OpenStack cases so you take the address of the slice
element (list.Items[i]) instead of the range variable.
In `@pkg/conversion/mapi2capi/vsphere.go`:
- Around line 130-135: The code dereferences v.infrastructure in the error path;
change the block to compute a nil-safe infraName before calling field.Invalid
(e.g. infraName := ""; if v.infrastructure != nil { infraName =
v.infrastructure.Status.InfrastructureName }) and then call errs = append(errs,
field.Invalid(field.NewPath("infrastructure", "status", "infrastructureName"),
infraName, "infrastructure cannot be nil and
infrastructure.Status.InfrastructureName cannot be empty")); keep the existing
else branch that sets capiMachine.Spec.ClusterName and
capiMachine.Labels[clusterv1.ClusterNameLabel] unchanged.
- Around line 197-203: The code dereferences v.infrastructure without checking
for nil when building the field.Invalid error and when assigning cluster fields;
change the logic to first check nil before accessing Status.InfrastructureName
(e.g., evaluate infraName only after confirming v.infrastructure != nil) and use
that safe value for the field.Invalid call and in the assignments to
capiMachineSet.Spec.Template.Spec.ClusterName, capiMachineSet.Spec.ClusterName,
and capiMachineSet.Labels[clusterv1.ClusterNameLabel]; ensure the nil branch
appends the error and the non-nil branch performs the assignments.
In `@pkg/util/conditions.go`:
- Around line 147-159: The docstring for GetConditionStatusFromInfraObject
contradicts the implementation: the code (and inline comment) call
GetConditionStatus (v1beta1) first, then fall back to GetV1Beta2ConditionStatus
(v1beta2); update the function comment to state it tries v1beta1 first (AWS,
Azure, GCP, etc.) and then falls back to v1beta2 (vSphere/newer providers), and
adjust any inline comment to match this behavior so documentation and
implementation are consistent.
🧹 Nitpick comments (4)
pkg/conversion/capi2mapi/vsphere_test.go (1)
348-399: Consider expanding MachineSet test coverage.The MachineSet conversion test table has only a single base configuration entry, while the Machine conversion tests cover many scenarios (clone modes, data disks, network configurations, etc.). Since
ToMachineSetinternally calls the Machine conversion logic, the core paths are tested, but adding a few more MachineSet-specific scenarios (e.g., with different replica counts, failure domains, or template variations) could improve confidence in the integration.pkg/conversion/capi2mapi/vsphere_fuzz_test.go (1)
142-189: Consider extracting shared spec normalization logic.The normalization logic in the
VSphereMachineSpecfuzzer (lines 142-189) andVSphereMachineTemplateSpecfuzzer (lines 210-254) is nearly identical. The only difference is that the template fuzzer accesses fields viaspec.Template.Spec.*instead ofspec.*.While acceptable for fuzz tests, extracting a shared helper function could reduce maintenance burden if fields need to be added or removed in the future.
Also applies to: 210-254
pkg/conversion/capi2mapi/vsphere.go (1)
202-206: Remove commented-out code artifacts.These lines appear to be leftover development artifacts that should be cleaned up before merging:
- Line 202-203: Commented TODO about delete policy
- Line 205: Incomplete comment about labels
🧹 Proposed fix
mapiMachineSet.Spec.Template.ObjectMeta.Annotations = mapiMachine.ObjectMeta.Annotations mapiMachineSet.Spec.Template.ObjectMeta.Labels = mapiMachine.ObjectMeta.Labels - //// todo: jcallen: afaik the default delete policy is empty - //mapiMachineSet.Spec.DeletePolicy = "" - - //mapiMachineSet.Spec.Template.ObjectMeta.Labels[] - return mapiMachineSet, warnings, nilpkg/conversion/mapi2capi/vsphere.go (1)
399-408: Consider warning on unknown clone mode values.The function silently defaults to
FullClonefor any unrecognized clone mode value. While defaulting is reasonable, this could mask configuration errors or unexpected values.Consider logging a warning or returning it in the warnings slice when an unexpected value is encountered.
💡 Example approach
// To add warning capability, change the function signature: func convertMAPICloneModeToCAPI(mapiMode mapiv1beta1.CloneMode) (vspherev1.CloneMode, []string) { switch mapiMode { case mapiv1beta1.FullClone: return vspherev1.FullClone, nil case mapiv1beta1.LinkedClone: return vspherev1.LinkedClone, nil case "": return vspherev1.FullClone, nil default: return vspherev1.FullClone, []string{fmt.Sprintf("unknown clone mode %q, defaulting to FullClone", mapiMode)} } }
a0cd7fd to
4a280fc
Compare
|
@AnnaZivkovic: This pull request references SPLAT-2410 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
4a280fc to
b7f603d
Compare
|
@AnnaZivkovic: This pull request references SPLAT-2410 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/test e2e-vsphere-capi-techpreview |
|
/test lint pod something something failed |
|
@AnnaZivkovic: This pull request references SPLAT-2410 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
48a5a82 to
4846757
Compare
|
@AnnaZivkovic: This pull request references SPLAT-2410 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/conversion/capi2mapi/vsphere_test.go`:
- Around line 348-399: The test suite for MachineSet conversion only contains a
single base Entry; add additional DescribeTable entries using the
vsphereCAPI2MAPIMachinesetConversionInput struct to mirror the Machine tests
(cover unsupported clone mode, invalid data disk provisioning mode, and network
DHCP+static IP mix) so
FromMachineSetAndVSphereMachineTemplateAndVSphereCluster(...).ToMachineSet() is
exercised for those edge cases; create VSphereMachineTemplate.Spec.Template.Spec
variants that set VirtualMachineCloneSpec.CloneMode unsupported value, DataDisks
with invalid ProvisioningMode, and Network interfaces combining DHCP and static
IPs, and assert expectedErrors/expectedWarnings matching the Machine test
expectations.
In `@pkg/util/conditions.go`:
- Around line 135-160: GetConditionStatusFromInfraObject currently falls back to
GetV1Beta2ConditionStatus and returns its error, which causes reconciles to fail
when a v1beta2 status block is simply absent; change the fallback logic in
GetConditionStatusFromInfraObject so that after calling
GetV1Beta2ConditionStatus you check for the sentinel errStatusV1Beta2NotFound
and treat it as a non-error by returning corev1.ConditionUnknown, nil; otherwise
return the status/error from GetV1Beta2ConditionStatus as before. Use the
existing symbols GetConditionStatusFromInfraObject, GetConditionStatus,
GetV1Beta2ConditionStatus, and errStatusV1Beta2NotFound to locate and implement
this conditional handling.
🧹 Nitpick comments (1)
🤖 Fix all nitpicks with AI agents
Verify each finding against the current code and only fix it if needed. In `@pkg/conversion/capi2mapi/vsphere_test.go`: - Around line 348-399: The test suite for MachineSet conversion only contains a single base Entry; add additional DescribeTable entries using the vsphereCAPI2MAPIMachinesetConversionInput struct to mirror the Machine tests (cover unsupported clone mode, invalid data disk provisioning mode, and network DHCP+static IP mix) so FromMachineSetAndVSphereMachineTemplateAndVSphereCluster(...).ToMachineSet() is exercised for those edge cases; create VSphereMachineTemplate.Spec.Template.Spec variants that set VirtualMachineCloneSpec.CloneMode unsupported value, DataDisks with invalid ProvisioningMode, and Network interfaces combining DHCP and static IPs, and assert expectedErrors/expectedWarnings matching the Machine test expectations.pkg/conversion/capi2mapi/vsphere_test.go (1)
348-399: Consider expanding MachineSet test coverage.The MachineSet conversion tests have only one entry (base case), while the Machine conversion tests above cover 14 scenarios including clone modes, data disks, networks, tags, and various error/warning cases.
Since
VSphereMachineTemplatewraps the sameVSphereMachineSpec, similar edge cases and error paths should ideally be validated for MachineSet conversions to ensure parity.Consider adding test entries for at least:
- Unsupported clone mode (error case)
- Invalid provisioning mode in data disks (error case)
- Network configurations with DHCP + static IPs (warning case)
This could be deferred if the underlying conversion logic is shared and already covered by the Machine tests.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/conversion/capi2mapi/vsphere_test.go` around lines 348 - 399, The test suite for MachineSet conversion only contains a single base Entry; add additional DescribeTable entries using the vsphereCAPI2MAPIMachinesetConversionInput struct to mirror the Machine tests (cover unsupported clone mode, invalid data disk provisioning mode, and network DHCP+static IP mix) so FromMachineSetAndVSphereMachineTemplateAndVSphereCluster(...).ToMachineSet() is exercised for those edge cases; create VSphereMachineTemplate.Spec.Template.Spec variants that set VirtualMachineCloneSpec.CloneMode unsupported value, DataDisks with invalid ProvisioningMode, and Network interfaces combining DHCP and static IPs, and assert expectedErrors/expectedWarnings matching the Machine test expectations.
4846757 to
de8ce8d
Compare
|
@damdo Could you give this pr a review? |
|
/lgtm |
|
/assign @damdo |
|
Scheduling tests matching the |
|
/test e2e-vsphere-capi-techpreview |
|
/retest |
58e4c81 to
7c7c6a6
Compare
|
/retest |
7c7c6a6 to
9e84159
Compare
|
/lgtm |
|
Scheduling tests matching the |
|
/retest |
|
@JoelSpeed would you mind giving this another review? |
|
/retest |
1 similar comment
|
/retest |
|
@mdbooth can you take another look at this when you get a chance? Thanks! |
pkg/conversion/capi2mapi/vsphere.go
Outdated
|
|
||
| mapiMachine.Spec.ProviderSpec.Value = vsphereSpecRawExt | ||
| // Note: ProviderStatus is not set during conversion, similar to OpenStack and PowerVS providers. | ||
| // The MAPI controller will manage the status at runtime. |
There was a problem hiding this comment.
Hmm, are we sure this is going to age well? Eventually we intend to turn off the MAPI controllers. What happens when CAPi is authoritative, does the machine provider status just sit stale?
There was a problem hiding this comment.
You're absolutely right when CAPI becomes authoritative the ProviderStatus will go stale. This follows the same pattern as OpenStack and PowerVS providers. I think the stale ProviderStatus is harmless because it's no longer the source of truth since we will be looking at VSphereMachine.Status instead.
Should I look into how aws handles this case?
There was a problem hiding this comment.
I don't know why those other providers are allowing stale status. That is not the expectation of the overall architecture. I would expect this to be kept updated.
Please check how AWS is handling this
| spec.PowerOffMode = "" | ||
| spec.GuestSoftPowerOffTimeout = nil | ||
| spec.FailureDomain = nil | ||
| spec.NamingStrategy = nil | ||
| spec.OS = "" | ||
| spec.HardwareVersion = "" | ||
| spec.StoragePolicyName = "" | ||
| spec.Thumbprint = "" | ||
| spec.Resources = vspherev1.VirtualMachineResources{} | ||
| spec.PciDevices = nil | ||
| spec.CustomVMXKeys = nil | ||
| spec.AdditionalDisksGiB = nil | ||
| spec.ProviderID = nil |
There was a problem hiding this comment.
We will want a VAP to prevent users from setting these fields, speak to @theobarberbany for guidance about this, but please make sure this is tracked
There was a problem hiding this comment.
We have SPLAT-2711 to track it. I touched base with Theo on this, and the plan is to implement it first and gather feedback during the review process.
9e84159 to
2418f99
Compare
| messageExpression: "variables.specPath + '.thumbprint is not supported - TLS validation is handled at cluster level'" | ||
|
|
||
| # CAPI-only runtime configuration | ||
| - expression: "!has(variables.machineSpec.customVMXKeys) || variables.machineSpec.customVMXKeys.size() == 0" |
There was a problem hiding this comment.
This one I am hesitant on, wdyt @vr4manta ?
While we might not have a direct correlation from mapi, in mapi we do use extra configs, so maybe its a good idea to keep this?
There was a problem hiding this comment.
From what I can tell, VAP is checking what CAPI exposes to users, and blocking anything that can't roundtrip through MAPI. My two cence is that customVMXKeys should stay blocked because Machine API doesn't expose this functionality to users, so there's no conversion path. While the MAPI controller uses extraConfig internally, this is controller-managed configuration, not user-configurable fields.
There was a problem hiding this comment.
hmm, so having MAPI conversion add these controller controlled to CAPI would make sense so if you switch from MAPI to CAPI, the state of the VM will be consistent. Adding additional parameters is what we may want to prevent then in this case. We can do that property checking in the capi2mapi if i recall correctly. Else we may want to look into making MAPI support exposing the ability to add these configs.
This enables bidirectional conversion between vSphere Machine API Machines/MachineSets and Cluster API Machines/MachineTemplates, including comprehensive fuzz and unit tests.
Add functions to handle v1beta2 conditions (used by vSphere provider) while maintaining backward compatibility with v1beta1 conditions (used by AWS, Azure, GCP providers).
Gate vSphere migration functionality behind theClusterAPIMachineManagementVSphere feature gate. The migrationcontrollers for vSphere will now only start when both the generalMachineAPIMigration feature gate and the vSphere-specificClusterAPIMachineManagementVSphere feature gate are enabled.
Implements admission-time validation to block unsupported vSphere fieldsin VSphereMachine and VSphereMachineTemplate resources, providing clear error messages before conversion attempts.
Align vSphere conversion with AWS implementation by converting and setting ProviderStatus during CAPI to MAPI conversion to ensure proper status synchronization between the two APIs.
fca2001 to
fd0e98d
Compare
|
@AnnaZivkovic: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/retest |
|
/lgtm |
|
Scheduling tests matching the |
Enables vSphere support in the Machine API to Cluster API conversion framework. It implements bidirectional conversion between vSphere Machine API resources (Machines/MachineSets) and Cluster API resources (Machines/MachineTemplates), including support for vSphere-specific v1beta2 conditions.
Changes
Summary by CodeRabbit
New Features
Bug Fixes
Tests