[release-4.21] OCPBUGS-88295, OCPBUGS-88297, OCPBUGS-82146, OCPBUGS-78330, OCPBUGS-85550, OCPBUGS-88324, OCPBUGS-88322, OCPBUGS-88320: Backport noOLM Gateway API test coverage and upgrade tests#31232
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
Skipping CI for Draft Pull Request. |
6e20b57 to
e14c7a9
Compare
|
Upgrades are actually not working for OLM (only affect 4.20->4.21 right now). https://redhat.atlassian.net/browse/OCPBUGS-86778 Let's test this via: |
|
@gcs278: trigger 0 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command |
|
/payload-job periodic-ci-openshift-release-main-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-upgrade |
|
@gcs278: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/71de0a60-5b7e-11f1-9d30-2d7158c6abc3-0 |
|
@gcs278: This pull request references NE-2286 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target either version "4.21." or "openshift-4.21.", but it targets "openshift-4.22" instead. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
bc7da27 to
a9db8cd
Compare
a9db8cd to
39715ca
Compare
|
/test all |
|
/test ? |
|
Origin 4.21 jobs don't run automatically: /test e2e-aws-csi |
|
Testing is a bit tricky for this one. First, we need to prove that we don't break CI with NO-OLM disabled. The pre-submits will mostly prove this already, but there are more upgrade tests that should be run: OLM to OLM: |
|
@gcs278: trigger 8 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/ba79f340-603a-11f1-95a8-8af10d374131-0 |
|
Next, we should validate with openshift/cluster-ingress-operator#1442, which shows OLM to noOLM migration is working. For this, we need to vendor the backport + removed FG annotation on RBAC manifests and then the FG to default to allow NO-OLM to run by default: OLM to noOLM: Then, I will run some /testwith for presubmits with NO-OLM to test the non-upgrade NO-OLM test jobs after this |
|
@gcs278: given command is invalid: at least one of the commands given is only supported on a one-command-per-comment basis, please separate out commands as multiple comments |
|
Ehh I guess I'll do it as a payload: noOLM fresh install (non-upgrade) on AWS, GCP, Azure |
|
/jira refresh |
|
@gcs278: This pull request references Jira Issue OCPBUGS-88324, which is valid. The bug has been moved to the POST state. 7 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@gcs278: This pull request references Jira Issue OCPBUGS-88295, which is valid. 7 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. This pull request references Jira Issue OCPBUGS-88297, which is valid. 7 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (iamin@redhat.com), skipping review request. The bug has been updated to refer to the pull request using the external bug tracker. This pull request references Jira Issue OCPBUGS-82146, which is valid. 7 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (iamin@redhat.com), skipping review request. The bug has been updated to refer to the pull request using the external bug tracker. This pull request references Jira Issue OCPBUGS-78330, which is valid. 7 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (iamin@redhat.com), skipping review request. The bug has been updated to refer to the pull request using the external bug tracker. This pull request references Jira Issue OCPBUGS-85550, which is valid. 7 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. This pull request references Jira Issue OCPBUGS-88324, which is valid. 7 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. This pull request references Jira Issue OCPBUGS-88322, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. This pull request references Jira Issue OCPBUGS-88320, which is valid. The bug has been moved to the POST state. 7 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (iamin@redhat.com), skipping review request. The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@gcs278: This pull request references Jira Issue OCPBUGS-88295, which is valid. 7 validation(s) were run on this bug
Requesting review from QA contact: This pull request references Jira Issue OCPBUGS-88297, which is valid. 7 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (iamin@redhat.com), skipping review request. This pull request references Jira Issue OCPBUGS-82146, which is valid. 7 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (iamin@redhat.com), skipping review request. This pull request references Jira Issue OCPBUGS-78330, which is valid. 7 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (iamin@redhat.com), skipping review request. This pull request references Jira Issue OCPBUGS-85550, which is valid. 7 validation(s) were run on this bug
Requesting review from QA contact: This pull request references Jira Issue OCPBUGS-88324, which is valid. 7 validation(s) were run on this bug
Requesting review from QA contact: This pull request references Jira Issue OCPBUGS-88322, which is valid. The bug has been moved to the POST state. 7 validation(s) were run on this bug
Requesting review from QA contact: This pull request references Jira Issue OCPBUGS-88320, which is valid. 7 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (iamin@redhat.com), skipping review request. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
This is ready to go |
| e2e.Logf("GatewayAPI resources successfully created with OLM-based provisioning") | ||
| } else { | ||
| g.By("Validating CIO-based (NO-OLM) provisioning before upgrade") | ||
| t.validateCIOProvisioning(ctx, false) // false = no migration occurred |
There was a problem hiding this comment.
is this set to false because on any type of upgrade NO-OLM is always enabled?
otherwise, seems like it is hardcoded
There was a problem hiding this comment.
nevermind, I think I see what you are doing here, maybe need to make the comment a bit more clear:
| t.validateCIOProvisioning(ctx, false) // false = no migration occurred | |
| t.validateCIOProvisioning(ctx, false) // false = no migration occurred yet, since this is a pre-upgrade check |
There was a problem hiding this comment.
on another thought we could use !t.startedWithNoOLM because this will always be false if NO-OLM is enabled and it'll be true if it isn't but then it would never make it into this else conditional.
Maybe I am overthinking it but I think it's better than the hardcoded false
There was a problem hiding this comment.
ah yea the no migration occured is a bit confusing - I agree with your suggestion for the comment.
Technically, we could use !t.startedWithNoOLM and it'd work, but I think hardcoding false is more explicit. It directly says "no migration occurred" rather than deriving it from another field that happens to have the same value.
Valid points - I think we can save for a follow up in a future PR on main.
There was a problem hiding this comment.
understandable, then in that case we can just update the comment to make it more clear?
There was a problem hiding this comment.
yes sure - i'll take note of this for a future PR we can merge into main so it's on the code branch where we are actively developing.
| } else { | ||
| g.By("GatewayAPIWithoutOLM is disabled - validating OLM-based provisioning") | ||
| // A shorter timeout here is because the resources should already exist post-upgrade state. | ||
| validateOLMBasedOSSM(t.oc, 2*time.Minute) |
There was a problem hiding this comment.
I think we possibly discussed this during 4.22, but just want to bring it up again. Having a longer timeout should not affect anything right? it might be a fail safe in case things take some time to get back to normal after the upgrade?
There was a problem hiding this comment.
Right, we discussed this before. The timeout is just to handle brief get failures, not to wait for OLM resources to appear. They should already exist at this point, otherwise we should actually fail. A longer timeout would risk masking a real problem.
But that said, it's a bit of a shortcut. Ideally we'd have a separate function that asserts the resources exist immediately, with retries only for transient API errors rather than waiting for them to appear. Rewriting all that logic felt messy, so I reused the existing function with a lower timeout as a compromise.
There was a problem hiding this comment.
I took another look and all each resource timeout is 2 minutes so I guess thats fine :)
| if migrationOccurred { | ||
| e2e.Logf("Gateway API successfully migrated from OLM to CIO (NO-OLM) during upgrade") | ||
| } else if endsWithNoOLM { | ||
| e2e.Logf("Gateway API using CIO-based (NO-OLM) provisioning - no migration occurred") |
There was a problem hiding this comment.
| e2e.Logf("Gateway API using CIO-based (NO-OLM) provisioning - no migration occurred") | |
| e2e.Logf("Gateway API using is using CIO-based (NO-OLM) provisioning pre/post upgrade - no migration occurred") |
kind of a nit but makes it a little more clear for someone reading it first time
There was a problem hiding this comment.
Yea some logs could be clearer - actually this one is specifically for post upgrade not. I'll keep a note of that for a follow up 👍
| g.By("Waiting for upgrade to complete") | ||
| <-done | ||
|
|
||
| g.By("Verifying Gateway still exists and is programmed") |
There was a problem hiding this comment.
any reason we dont check the status of the gatewayClass first using checkGatewayClassCondition(t.oc, gatewayClassName, string(gatewayv1.GatewayClassConditionStatusAccepted), metav1.ConditionTrue)
There was a problem hiding this comment.
No particular reason except that we call it later in validateCIOProvisioning because it's specific to sail library / noOLM. The OLM path doesn't set these conditions.
You'd just need to have extra if statements here to call it otherwise, or reorder everything and pull it out of that function. But I see your point - I generally like to start at the "top" and work my way down the hierarchy.
There was a problem hiding this comment.
do we want to update it then in main followup?
There was a problem hiding this comment.
Possibly - I don't feel so strongly as it still correctly checks everything - the test is providing appropriate coverage either way. I'd say it'd be a consideration if we were tweaking/refactoring the upgrade test in a future PR.
| } | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
do we skip httproute cleanup?
There was a problem hiding this comment.
Oh nice catch! We orphan a httproute. Not the end of the world, but yes, not desirable. We should follow up with this in the next PR for updating this test in main.
| configv1.IBMCloudPlatformType, | ||
| configv1.VSpherePlatformType, | ||
| configv1.BareMetalPlatformType, | ||
| configv1.EquinixMetalPlatformType: |
There was a problem hiding this comment.
do we have CI tests for all of these supported platform types?
There was a problem hiding this comment.
I took a look at the currently ran tests and it seems like there is nothing running Azure, Equinix, or IBM
There was a problem hiding this comment.
When you say "currently ran tests" - do you mean pre-submits? We definitely have periodic azure and equinix (I think equinix==metal), and ibmcloud (but not sure how often it runs if ever).
Either way, it's here for a these platforms if it's needed - these platforms should be supported.
gcs278
left a comment
There was a problem hiding this comment.
Nice thanks for the review @rhamini3 - good comments and good catch with the orphaned HTTPRoute.
Sorry to punt on these but I think it'd be better to address in the main branch in a future PR (otherwise it'd just be for 4.21) - I'll be sure to refer back to here when a future PR touches these files.
| e2e.Logf("GatewayAPI resources successfully created with OLM-based provisioning") | ||
| } else { | ||
| g.By("Validating CIO-based (NO-OLM) provisioning before upgrade") | ||
| t.validateCIOProvisioning(ctx, false) // false = no migration occurred |
There was a problem hiding this comment.
ah yea the no migration occured is a bit confusing - I agree with your suggestion for the comment.
Technically, we could use !t.startedWithNoOLM and it'd work, but I think hardcoding false is more explicit. It directly says "no migration occurred" rather than deriving it from another field that happens to have the same value.
Valid points - I think we can save for a follow up in a future PR on main.
| } else { | ||
| g.By("GatewayAPIWithoutOLM is disabled - validating OLM-based provisioning") | ||
| // A shorter timeout here is because the resources should already exist post-upgrade state. | ||
| validateOLMBasedOSSM(t.oc, 2*time.Minute) |
There was a problem hiding this comment.
Right, we discussed this before. The timeout is just to handle brief get failures, not to wait for OLM resources to appear. They should already exist at this point, otherwise we should actually fail. A longer timeout would risk masking a real problem.
But that said, it's a bit of a shortcut. Ideally we'd have a separate function that asserts the resources exist immediately, with retries only for transient API errors rather than waiting for them to appear. Rewriting all that logic felt messy, so I reused the existing function with a lower timeout as a compromise.
| if migrationOccurred { | ||
| e2e.Logf("Gateway API successfully migrated from OLM to CIO (NO-OLM) during upgrade") | ||
| } else if endsWithNoOLM { | ||
| e2e.Logf("Gateway API using CIO-based (NO-OLM) provisioning - no migration occurred") |
There was a problem hiding this comment.
Yea some logs could be clearer - actually this one is specifically for post upgrade not. I'll keep a note of that for a follow up 👍
| g.By("Waiting for upgrade to complete") | ||
| <-done | ||
|
|
||
| g.By("Verifying Gateway still exists and is programmed") |
There was a problem hiding this comment.
No particular reason except that we call it later in validateCIOProvisioning because it's specific to sail library / noOLM. The OLM path doesn't set these conditions.
You'd just need to have extra if statements here to call it otherwise, or reorder everything and pull it out of that function. But I see your point - I generally like to start at the "top" and work my way down the hierarchy.
| } | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Oh nice catch! We orphan a httproute. Not the end of the world, but yes, not desirable. We should follow up with this in the next PR for updating this test in main.
| configv1.IBMCloudPlatformType, | ||
| configv1.VSpherePlatformType, | ||
| configv1.BareMetalPlatformType, | ||
| configv1.EquinixMetalPlatformType: |
There was a problem hiding this comment.
When you say "currently ran tests" - do you mean pre-submits? We definitely have periodic azure and equinix (I think equinix==metal), and ibmcloud (but not sure how often it runs if ever).
Either way, it's here for a these platforms if it's needed - these platforms should be supported.
|
/lgtm |
|
/approve |
|
@gcs278: This pull request references Jira Issue OCPBUGS-88295, which is valid. 7 validation(s) were run on this bug
Requesting review from QA contact: This pull request references Jira Issue OCPBUGS-88297, which is valid. 7 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (iamin@redhat.com), skipping review request. This pull request references Jira Issue OCPBUGS-82146, which is valid. 7 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (iamin@redhat.com), skipping review request. This pull request references Jira Issue OCPBUGS-78330, which is valid. 7 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (iamin@redhat.com), skipping review request. This pull request references Jira Issue OCPBUGS-85550, which is valid. 7 validation(s) were run on this bug
Requesting review from QA contact: This pull request references Jira Issue OCPBUGS-88324, which is valid. 7 validation(s) were run on this bug
Requesting review from QA contact: This pull request references Jira Issue OCPBUGS-88322, which is valid. 7 validation(s) were run on this bug
Requesting review from QA contact: This pull request references Jira Issue OCPBUGS-88320, which is valid. 7 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (iamin@redhat.com), skipping review request. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: gcs278, neisw, rhamini3 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
OCPBUGS-88324 OCPBUGS-88322, OCPBUGS-88295 OCPBUGS-88320 /verified by e2e |
|
@rhamini3: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@gcs278: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
9a35db2
into
openshift:release-4.21
|
@gcs278: Jira Issue OCPBUGS-88295: Some pull requests linked via external trackers have merged: The following pull request, linked via external tracker, has not merged: All associated pull requests must be merged or unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with Jira Issue OCPBUGS-88295 has not been moved to the MODIFIED state. This PR is marked as verified. If the remaining PRs listed above are marked as verified before merging, the issue will automatically be moved to VERIFIED after all of the changes from the PRs are available in an accepted nightly payload. Jira Issue OCPBUGS-88297: Some pull requests linked via external trackers have merged: The following pull request, linked via external tracker, has not merged: All associated pull requests must be merged or unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with Jira Issue OCPBUGS-88297 has not been moved to the MODIFIED state. This PR is marked as verified. If the remaining PRs listed above are marked as verified before merging, the issue will automatically be moved to VERIFIED after all of the changes from the PRs are available in an accepted nightly payload. Jira Issue OCPBUGS-82146: Some pull requests linked via external trackers have merged: The following pull request, linked via external tracker, has not merged: All associated pull requests must be merged or unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with Jira Issue OCPBUGS-82146 has not been moved to the MODIFIED state. This PR is marked as verified. If the remaining PRs listed above are marked as verified before merging, the issue will automatically be moved to VERIFIED after all of the changes from the PRs are available in an accepted nightly payload. Jira Issue OCPBUGS-78330: Some pull requests linked via external trackers have merged: The following pull request, linked via external tracker, has not merged: All associated pull requests must be merged or unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with Jira Issue OCPBUGS-78330 has not been moved to the MODIFIED state. This PR is marked as verified. If the remaining PRs listed above are marked as verified before merging, the issue will automatically be moved to VERIFIED after all of the changes from the PRs are available in an accepted nightly payload. Jira Issue OCPBUGS-85550: Some pull requests linked via external trackers have merged: The following pull request, linked via external tracker, has not merged: All associated pull requests must be merged or unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with Jira Issue OCPBUGS-85550 has not been moved to the MODIFIED state. This PR is marked as verified. If the remaining PRs listed above are marked as verified before merging, the issue will automatically be moved to VERIFIED after all of the changes from the PRs are available in an accepted nightly payload. Jira Issue Verification Checks: Jira Issue OCPBUGS-88324 Jira Issue OCPBUGS-88324 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓 Jira Issue Verification Checks: Jira Issue OCPBUGS-88322 Jira Issue OCPBUGS-88322 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓 Jira Issue Verification Checks: Jira Issue OCPBUGS-88320 Jira Issue OCPBUGS-88320 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Summary
Backport of Gateway API noOLM (Sail Library) test coverage and upgrade tests to release-4.21, as part of the Sail Library backport (NE-2286). This provides full test coverage for the
GatewayAPIWithoutOLMfeature gate, including OLM-to-Sail-Library migration upgrade testing, test flake fixes, and parallel worker cleanup fixes.Depends on #31139
Cherry-picked PRs
Early CI Test Results (GatewayAPIWithoutOLM FG Disabled)
This PR backports the Gateway API upgrade test and bug fixes to release-4.21. The
GatewayAPIWithoutOLMfeature gate is disabled on this branch, so all default CI jobs test the existing OLM-based Gateway API path only. To validate the noOLM (Sail Library) path, we run/testwithincluding openshift/cluster-ingress-operator#1462 and openshift/api#2865, which enable the feature gate.OLM Tests (FG Disabled)
These tests run with the
GatewayAPIWithoutOLMfeature gate disabled, so Gateway API uses the existing OLM-based installation path. They confirm the backported test code does not regress existing behavior.Pre-submits
Pre-submits are passing, including the
e2e-gcp-ovn-upgradejob which runs the Gateway API upgrade test against the OLM-based path.Full PR test history
Payload Upgrade Jobs
Additional upgrade coverage across platforms beyond what pre-submits test.
Example Payload run
GatewayAPIWithoutOLMfeature gate (openshift/api#2864) — latest candidate includes the FG but previous candidate predates it, so the test sees a missing FG post-upgrade. Will self-resolve once both candidates include the API change.required-scc-annotation-checkeracross many namespaces (etcd, kube-apiserver, dns, multus, ovn-kubernetes, etc.) andterminationMessagePolicyon istiod/servicemesh pods — not related to this PRnoOLM Tests (FG Enabled via /testwith with CIO openshift/cluster-ingress-operator#1462 + API openshift/api#2865)
These tests include the CIO and API PRs that enable the
GatewayAPIWithoutOLMfeature gate, so Gateway API uses the Sail Library installation path instead of OLM.e2e-gcp-ovnvalidates fresh install with Sail Librarye2e-gcp-ovn-upgradevalidates migration from OLM to Sail Library during upgradeMigration log from the upgrade test confirms successful OLM-to-Sail-Library transition:
🤖 Generated with Claude Code