Skip to content

OCPBUGS-87905: Process rebuild annotation on machine-os-builder restart#6160

Merged
openshift-merge-bot[bot] merged 1 commit into
openshift:mainfrom
isabella-janssen:ocpbugs-87905
Jun 15, 2026
Merged

OCPBUGS-87905: Process rebuild annotation on machine-os-builder restart#6160
openshift-merge-bot[bot] merged 1 commit into
openshift:mainfrom
isabella-janssen:ocpbugs-87905

Conversation

@isabella-janssen

@isabella-janssen isabella-janssen commented Jun 9, 2026

Copy link
Copy Markdown
Member

Closes: OCPBUGS-87905

- What I did
This adds a check in the main function to determine if a MachineOSBuild should be created to see if the MachineOSConfig has the rebuild annotation. In cases where we reach this function and the MOSC already has a rebuild annotation, it signals that something like a pod restart interrupted the MOSC build and so the MOSC should be rebuilt. This is especially important for SNO cases due to how pod restarts happen in the platform.

- How to verify it
The issue this remediates was highlighted in the SNO test suites and it seems this issue is somewhat unique to SNO due to the way pods are restarted. Thus, the [sig-mco][Suite:openshift/machine-config-operator/disruptive][Serial][Disruptive] MCO ocb [PolarionID:77781][OTP] A successfully built MachineOSConfig can be re-build should continue passing in all platforms and start passing in the SNO suite.

- Description for the changelog
OCPBUGS-87905: Process rebuild annotation on machine-os-builder restart

Summary by CodeRabbit

  • Bug Fixes

    • Rebuild annotation is now processed with immediate priority, ensuring rebuilds are triggered before normal seeding/sync paths.
  • Tests

    • Rebuild verification timeout increased (2m → 5m) to reduce flakiness when confirming reuse and new image build behavior.

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 9, 2026
@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 67548525-0f1d-44d0-a42f-f1d8c6447180

📥 Commits

Reviewing files that changed from the base of the PR and between 7ced116 and f88d97c.

📒 Files selected for processing (2)
  • pkg/controller/build/reconciler.go
  • test/extended-priv/mco_ocb.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • test/extended-priv/mco_ocb.go
  • pkg/controller/build/reconciler.go

Walkthrough

The reconciler now short-circuits in addMachineOSConfig when a MachineOSConfig has the rebuild annotation, calling rebuildMachineOSConfig and exiting; the rebuild verification test wait was increased to 5 minutes.

Changes

MachineOSConfig rebuild annotation handling

Layer / File(s) Summary
Rebuild annotation short-circuit in addMachineOSConfig
pkg/controller/build/reconciler.go
The addMachineOSConfig function now checks for the rebuild annotation at entry and calls rebuildMachineOSConfig directly if present, skipping the pre-built image seeding and subsequent sync workflow.
Rebuild verification timing in test
test/extended-priv/mco_ocb.go
RebuildImageAndCheck now waits up to 5 minutes (was 2m) for mosb.GetJob to appear when verifying rebuild behavior.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested labels

jira/severity-important, jira/valid-bug, jira/valid-reference

Suggested reviewers

  • RishabhSaini
  • cheesesashimi
🚥 Pre-merge checks | ✅ 14 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Test Structure And Quality ⚠️ Warning Test file mco_ocb.go has 6 assertions without meaningful failure messages (e.g., o.Expect(err).NotTo(o.HaveOccurred()) at lines 642, 660, 667, 685, 706, 728) and multiple tests test multiple unrela... Add meaningful failure messages to all assertions per the check requirements (e.g., 'failed to check RPM files'). Refactor tests to focus on single behaviors per test case.
✅ Passed checks (14 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and clearly describes the main change: processing the rebuild annotation when machine-os-builder restarts, which aligns with the core modification in the reconciler.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All Ginkgo test names in the PR are static string literals with no dynamic values, timestamps, UUIDs, or generated identifiers. Test titles are descriptive and will remain stable across runs.
Microshift Test Compatibility ✅ Passed No new Ginkgo e2e tests are added in this PR. Changes only include reconciler logic (+7 lines) and existing test timeout adjustment (+1/-1 lines). The MicroShift compatibility check is not applicable.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No new Ginkgo e2e tests are added in this PR. Only an existing test timeout is increased from 2m to 5m. The test file uses SNO-aware helper functions (GetCompactCompatiblePool, IsCompactOrSNOCluste...
Topology-Aware Scheduling Compatibility ✅ Passed PR introduces only reconciler annotation-checking logic and test timeout adjustment. No scheduling constraints, pod affinity, topology spread, nodeSelector, or PodDisruptionBudget changes present.
Ote Binary Stdout Contract ✅ Passed No OTE Binary Stdout Contract violations detected. Changes are in production controller code (using standard klog) and within Ginkgo test blocks (explicitly excluded from the check).
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No new Ginkgo e2e tests added; PR modifies existing code in reconciler and test files. Existing tests use cluster-internal resources without IPv4 assumptions or external connectivity.
No-Weak-Crypto ✅ Passed The PR introduces no weak cryptography. Changes consist of: (1) annotation checking logic in reconciler.go, (2) test timeout adjustment in mco_ocb.go. No crypto imports, weak algorithms, or insecur...
Container-Privileges ✅ Passed PR changes only Go source code files (reconciler.go and mco_ocb.go), not K8s container manifests. No privileged container specs, host access, or privilege escalation settings are introduced.
No-Sensitive-Data-In-Logs ✅ Passed PR adds rebuild annotation check in addMachineOSConfig (7 lines, no new logging) and adjusts test timeout (1 line). No new logging statements that expose passwords, tokens, API keys, PII, session I...

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci

openshift-ci Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 9, 2026
@isabella-janssen

Copy link
Copy Markdown
Member Author

/payload-job periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-aws-mco-single-node-disruptive-techpreview periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-gcp-mco-disruptive-techpreview-1of3 periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-gcp-mco-disruptive-techpreview-2of3 periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-gcp-mco-disruptive-techpreview-3of3

@openshift-ci

openshift-ci Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

@isabella-janssen: trigger 4 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-aws-mco-single-node-disruptive-techpreview
  • periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-gcp-mco-disruptive-techpreview-1of3
  • periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-gcp-mco-disruptive-techpreview-2of3
  • periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-gcp-mco-disruptive-techpreview-3of3

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/f2077300-64c7-11f1-9b85-aada3486fa48-0

@isabella-janssen

Copy link
Copy Markdown
Member Author

/payload-job periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-aws-mco-single-node-disruptive-techpreview periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-gcp-mco-disruptive-techpreview-1of3 periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-gcp-mco-disruptive-techpreview-2of3 periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-gcp-mco-disruptive-techpreview-3of3

@openshift-ci

openshift-ci Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

@isabella-janssen: trigger 4 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-aws-mco-single-node-disruptive-techpreview
  • periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-gcp-mco-disruptive-techpreview-1of3
  • periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-gcp-mco-disruptive-techpreview-2of3
  • periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-gcp-mco-disruptive-techpreview-3of3

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/9e39e170-64ce-11f1-8da7-028d145b8b1d-0

@isabella-janssen

Copy link
Copy Markdown
Member Author

/payload-job periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-aws-mco-single-node-disruptive-techpreview

@openshift-ci

openshift-ci Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

@isabella-janssen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-aws-mco-single-node-disruptive-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/e3b95200-64fe-11f1-80b9-c93395452f6c-0

…estart & increase test timeout

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@isabella-janssen isabella-janssen changed the title (WIP) OCPBUGS-87905 OCPBUGS-87905: Process rebuild annotation on machine-os-builder restart Jun 11, 2026
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 11, 2026
@openshift-ci-robot

Copy link
Copy Markdown
Contributor

@isabella-janssen: This pull request references Jira Issue OCPBUGS-87905, which is invalid:

  • expected the bug to target the "5.0.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Closes: OCPBUGS-87905

- What I did
This adds a check in the main function to determine if a MachineOSBuild should be created to see if the MachineOSConfig has the rebuild annotation. In cases where we reach this function and the MOSC already has a rebuild annotation, it signals that something like a pod restart interrupted the MOSC build and so the MOSC should be rebuilt. This is especially important for SNO cases due to how pod restarts happen in the platform.

- How to verify it
The issue this remediates was highlighted in the SNO test suites and it seems this issue is somewhat unique to SNO due to the way pods are restarted. Thus, the [sig-mco][Suite:openshift/machine-config-operator/disruptive][Serial][Disruptive] MCO ocb [PolarionID:77781][OTP] A successfully built MachineOSConfig can be re-build should continue passing in all platforms and start passing in the SNO suite.

- Description for the changelog
OCPBUGS-87905: Process rebuild annotation on machine-os-builder restart

Summary by CodeRabbit

  • Bug Fixes

  • MachineOSConfig rebuild annotation is now processed immediately with higher priority, ensuring rebuilds are triggered without running normal seeding/sync paths first.

  • Tests

  • Rebuild verification timeout increased (2m → 5m) to stabilize test timing when confirming reuse and new image build behavior.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Jun 11, 2026
@isabella-janssen

Copy link
Copy Markdown
Member Author

/jira refresh

@isabella-janssen isabella-janssen marked this pull request as ready for review June 11, 2026 12:14
@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jun 11, 2026
@openshift-ci-robot

Copy link
Copy Markdown
Contributor

@isabella-janssen: This pull request references Jira Issue OCPBUGS-87905, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 11, 2026
@openshift-ci openshift-ci Bot requested review from djoshy and umohnani8 June 11, 2026 12:15
@openshift-ci-robot

Copy link
Copy Markdown
Contributor

@isabella-janssen: This pull request references Jira Issue OCPBUGS-87905, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

Closes: OCPBUGS-87905

- What I did
This adds a check in the main function to determine if a MachineOSBuild should be created to see if the MachineOSConfig has the rebuild annotation. In cases where we reach this function and the MOSC already has a rebuild annotation, it signals that something like a pod restart interrupted the MOSC build and so the MOSC should be rebuilt. This is especially important for SNO cases due to how pod restarts happen in the platform.

- How to verify it
The issue this remediates was highlighted in the SNO test suites and it seems this issue is somewhat unique to SNO due to the way pods are restarted. Thus, the [sig-mco][Suite:openshift/machine-config-operator/disruptive][Serial][Disruptive] MCO ocb [PolarionID:77781][OTP] A successfully built MachineOSConfig can be re-build should continue passing in all platforms and start passing in the SNO suite.

- Description for the changelog
OCPBUGS-87905: Process rebuild annotation on machine-os-builder restart

Summary by CodeRabbit

  • Bug Fixes

  • Rebuild annotation is now processed with immediate priority, ensuring rebuilds are triggered before normal seeding/sync paths.

  • Tests

  • Rebuild verification timeout increased (2m → 5m) to reduce flakiness when confirming reuse and new image build behavior.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@isabella-janssen

Copy link
Copy Markdown
Member Author

/test unit

@dkhater-redhat

Copy link
Copy Markdown
Contributor

/test e2e-gcp-op-ocl-part1
/test e2e-gcp-op-ocl-part2

@dkhater-redhat

Copy link
Copy Markdown
Contributor

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jun 11, 2026
@openshift-ci

openshift-ci Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dkhater-redhat, isabella-janssen

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [dkhater-redhat,isabella-janssen]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Tests from second stage were triggered manually. Pipeline can be controlled only manually, until HEAD changes. Use command to trigger second stage.

@isabella-janssen

Copy link
Copy Markdown
Member Author

/test e2e-aws-ovn
/test e2e-aws-ovn-upgrad
/test e2e-gcp-op-part1
/test e2e-gcp-op-part2
/test e2e-gcp-op-single-node
/test e2e-hypershift

@isabella-janssen

Copy link
Copy Markdown
Member Author

/retest-required

1 similar comment
@isabella-janssen

Copy link
Copy Markdown
Member Author

/retest-required

@isabella-janssen

Copy link
Copy Markdown
Member Author

/verified by payloads

See that the A successfully built MachineOSConfig can be re-build test is now passing on SNO in #6160 (comment) & all other OCL related CI looks healthy

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Jun 12, 2026
@openshift-ci-robot

Copy link
Copy Markdown
Contributor

@isabella-janssen: This PR has been marked as verified by payloads.

Details

In response to this:

/verified by payloads

See that the A successfully built MachineOSConfig can be re-build test is now passing on SNO in #6160 (comment) & all other OCL related CI looks healthy

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@isabella-janssen

Copy link
Copy Markdown
Member Author

/cherrypick release-4.22 release-4.21 release-4.20 release-4.19 release-4.18

@openshift-cherrypick-robot

Copy link
Copy Markdown

@isabella-janssen: once the present PR merges, I will cherry-pick it on top of release-4.22 in a new PR and assign it to you.

Details

In response to this:

/cherrypick release-4.22 release-4.21 release-4.20 release-4.19 release-4.18

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@isabella-janssen

Copy link
Copy Markdown
Member Author

/test e2e-aws-ovn-upgrade

@openshift-ci

openshift-ci Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

@isabella-janssen: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot Bot merged commit 49eaf75 into openshift:main Jun 15, 2026
17 checks passed
@openshift-ci-robot

Copy link
Copy Markdown
Contributor

@isabella-janssen: Jira Issue Verification Checks: Jira Issue OCPBUGS-87905
✔️ This pull request was pre-merge verified.
✔️ All associated pull requests have merged.
✔️ All associated, merged pull requests were pre-merge verified.

Jira Issue OCPBUGS-87905 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓

Details

In response to this:

Closes: OCPBUGS-87905

- What I did
This adds a check in the main function to determine if a MachineOSBuild should be created to see if the MachineOSConfig has the rebuild annotation. In cases where we reach this function and the MOSC already has a rebuild annotation, it signals that something like a pod restart interrupted the MOSC build and so the MOSC should be rebuilt. This is especially important for SNO cases due to how pod restarts happen in the platform.

- How to verify it
The issue this remediates was highlighted in the SNO test suites and it seems this issue is somewhat unique to SNO due to the way pods are restarted. Thus, the [sig-mco][Suite:openshift/machine-config-operator/disruptive][Serial][Disruptive] MCO ocb [PolarionID:77781][OTP] A successfully built MachineOSConfig can be re-build should continue passing in all platforms and start passing in the SNO suite.

- Description for the changelog
OCPBUGS-87905: Process rebuild annotation on machine-os-builder restart

Summary by CodeRabbit

  • Bug Fixes

  • Rebuild annotation is now processed with immediate priority, ensuring rebuilds are triggered before normal seeding/sync paths.

  • Tests

  • Rebuild verification timeout increased (2m → 5m) to reduce flakiness when confirming reuse and new image build behavior.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-cherrypick-robot

Copy link
Copy Markdown

@isabella-janssen: new pull request created: #6190

Details

In response to this:

/cherrypick release-4.22 release-4.21 release-4.20 release-4.19 release-4.18

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-merge-robot

Copy link
Copy Markdown
Contributor

Fix included in release 5.0.0-0.nightly-2026-06-16-065742

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants