Skip to content

MCO: fix arm64 arm64-periodics jobs failing due to incompatible image…#80454

Closed
ptalgulk01 wants to merge 3 commits into
openshift:mainfrom
ptalgulk01:ppt/fix-mco-arm64-upi-installer-image
Closed

MCO: fix arm64 arm64-periodics jobs failing due to incompatible image…#80454
ptalgulk01 wants to merge 3 commits into
openshift:mainfrom
ptalgulk01:ppt/fix-mco-arm64-upi-installer-image

Conversation

@ptalgulk01

@ptalgulk01 ptalgulk01 commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

… arch

The mco-conf-day2-add-mcoqe-robot-to-pull-secret step used from: upi-installer which resolves to ocp_5.0_upi-installer — an x86_64-only image. On arm64 periodic jobs the multiarch-tuning-operator detects no supported architectures in common and sets supportedArchitectures:{}, making the pod permanently unschedulable. This has been causing all MCO arm64 periodic jobs to fail at this step before any tests run.

Switch to origin/centos:8 which is a multi-arch manifest list (already used by other steps in the same workflow) and provides the jq/bash/base64 tools the step script needs. oc is already injected via cli:latest.

Summary by CodeRabbit

This PR fixes Machine Config Operator (MCO) arm64 periodic CI jobs by making the failing ci-operator step multi-arch and removing an x86-only tool dependency.

What changed

  • The mco/conf/day2/add-mcoqe-robot-to-pull-secret step now uses from: cli (with cli: latest) in the step ref, so the base image resolves to the job's release payload at the job's architecture instead of an x86-only upi-installer image.
  • The step's merge logic was changed from using jq to an inline Python3 deep-merge routine (python3 is available in the cli image). The script reads the cluster pull-secret and the mcoqe credentials, deep-merges the JSON, and updates the pull-secret as before. All oc-based operations remain unchanged (oc continues to be provided via cli: latest).

Why this matters

  • The previous base image was x86_64-only, which caused the multiarch-tuning-operator to detect no common supported architectures on arm64 runs and left the pod unschedulable, failing MCO arm64 periodic jobs before tests started. Using the cli multi-arch image and Python merge avoids that scheduling blockage and works across architectures and release streams.

Affected area

  • CI configuration for MCO: ci-operator step-registry mco/conf/day2/add-mcoqe-robot-to-pull-secret

… arch

The mco-conf-day2-add-mcoqe-robot-to-pull-secret step used
`from: upi-installer` which resolves to ocp_5.0_upi-installer — an
x86_64-only image. On arm64 periodic jobs the multiarch-tuning-operator
detects no supported architectures in common and sets
supportedArchitectures:{}, making the pod permanently unschedulable.
This has been causing all MCO arm64 periodic jobs to fail at this step
before any tests run.

Switch to origin/centos:8 which is a multi-arch manifest list (already
used by other steps in the same workflow) and provides the jq/bash/base64
tools the step script needs. oc is already injected via cli:latest.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: a58929ac-9327-4245-b029-8eff83a26835

📥 Commits

Reviewing files that changed from the base of the PR and between 1a1581b and 037c491.

📒 Files selected for processing (2)
  • ci-operator/step-registry/mco/conf/day2/add-mcoqe-robot-to-pull-secret/mco-conf-day2-add-mcoqe-robot-to-pull-secret-commands.sh
  • ci-operator/step-registry/mco/conf/day2/add-mcoqe-robot-to-pull-secret/mco-conf-day2-add-mcoqe-robot-to-pull-secret-ref.yaml

Walkthrough

Change the MCO day2 pull-secret step: switch the pull-secret source selector from upi-installer to cli, and replace the script’s jq merge with an inline Python recursive deep-merge that writes the merged pull-secret JSON.

Changes

MCO day2 pull-secret update

Layer / File(s) Summary
Pull-secret source selector
ci-operator/step-registry/mco/conf/day2/add-mcoqe-robot-to-pull-secret/mco-conf-day2-add-mcoqe-robot-to-pull-secret-ref.yaml
Updated the step reference: changed the pull-secret source from value from upi-installer to cli.
Pull-secret merge implementation
ci-operator/step-registry/mco/conf/day2/add-mcoqe-robot-to-pull-secret/mco-conf-day2-add-mcoqe-robot-to-pull-secret-commands.sh
Replaced the jq-based JSON merge with an inline Python recursive deep-merge that loads both JSON files, merges nested objects, and writes merged JSON to the merged pull-secret file.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

lgtm, rehearsals-ack

Suggested reviewers

  • pruan-rht
  • sadasu
  • dis016
🚥 Pre-merge checks | ✅ 15
✅ Passed checks (15 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly identifies the main change: fixing MCO arm64 periodic jobs that fail due to incompatible image architecture, directly matching the PR's primary objective.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed PR updates CI step registry YAML/bash (pull-secret source + merge logic) and contains no Ginkgo It/Describe/Context/When titles in those changed files.
Test Structure And Quality ✅ Passed No Ginkgo test code exists in this repo checkout (no ginkgo imports; only *_test.go under tools), and the PR changes are limited to MCO step YAML/Shell, so quality rules are not applicable.
Microshift Test Compatibility ✅ Passed PR changes only MCO day2 pull-secret step YAML and commands.sh; git diff main..HEAD shows no _test.go or ginkgo/e2e test files, so MicroShift compatibility check is not triggered.
Single Node Openshift (Sno) Test Compatibility ✅ Passed PR #80454 only updates an MCO step-registry YAML (no Ginkgo e2e test files added), so there are no new multi-node/HA assumptions to flag for SNO.
Topology-Aware Scheduling Compatibility ✅ Passed Inspected step-registry ref+commands: only switches pull-secret source to from: cli and deep-merges JSON pull secrets via Python/oc set data; no affinity/topology/spread/nodeSelector or other s...
Ote Binary Stdout Contract ✅ Passed PR #80454 commit a314a1e shows only ci-operator mco-conf-day2-add-mcoqe-robot-to-pull-secret-ref.yaml changed; no OTE/Go main/TestMain code affected, so stdout contract unchanged.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed This PR does not add new Ginkgo e2e tests. It modifies CI operator configuration and a shell script for pull secret handling. The check is not applicable.
No-Weak-Crypto ✅ Passed Scanned the PR’s step directory files for MD5/SHA1/DES/RC4/3DES/Blowfish/ECB/openssl/bcrypt/hmac/crypto/constant-time and found none; updated script only deep-merges JSON.
Container-Privileges ✅ Passed Fetched PR step registry ref YAML and commands script; none contain privileged/hostPID/hostNetwork/hostIPC/SYS_ADMIN/allowPrivilegeEscalation/runAsUser/securityContext fields.
No-Sensitive-Data-In-Logs ✅ Passed In commands.sh, secret-bearing data from oc/echo/base64/python is redirected to temp files; logs only emit status messages (no set -x/printing of secrets).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from bandrade and sergiordlr June 12, 2026 05:35
@openshift-ci

openshift-ci Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ptalgulk01

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 12, 2026
…r image

The mco-conf-day2-add-mcoqe-robot-to-pull-secret step used
`from: upi-installer` which resolves to ocp_5.0_upi-installer — an
x86_64-only image. On arm64 periodic jobs the multiarch-tuning-operator
detects no supported architectures in common and sets
supportedArchitectures:{}, making the pod permanently unschedulable.
This has caused all MCO arm64 periodic jobs to fail before any tests run.

The same step runs fine on amd64 GCP jobs (e.g. e2e-gcp-mco-disruptive-
techpreview-3of3 succeeded in 3m22s with the same image).

Switch to ocp/4.18:upi-installer which is a multi-arch image with the
same toolset (jq, bash, base64) and is already used successfully in the
same arm64 workflow (ipi-conf-telemetry step).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@ci-operator/step-registry/mco/conf/day2/add-mcoqe-robot-to-pull-secret/mco-conf-day2-add-mcoqe-robot-to-pull-secret-ref.yaml`:
- Around line 4-6: The ref hardcodes the installer image via from_image.name:
"4.18" and tag: upi-installer for ref
mco-conf-day2-add-mcoqe-robot-to-pull-secret; change this to derive the image
from the target release (e.g., use the release variable or templated name rather
than "4.18") or switch to a stable multi-arch unversioned image (e.g., a
canonical upi-installer repository/tag) so jobs for 4.19/4.20/etc. don’t run
with 4.18; alternatively, if you must keep "4.18", add a clear comment in the
ref explaining why all streams should use 4.18 and include a link or short
verification showing the image contains jq, bash, base64, and oc. Ensure you
update references to from_image.name, tag, and the ref identifier
mco-conf-day2-add-mcoqe-robot-to-pull-secret accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 8a9c653f-3bcf-4a42-868c-b285cf0e7b23

📥 Commits

Reviewing files that changed from the base of the PR and between a314a1e and 1a1581b.

📒 Files selected for processing (1)
  • ci-operator/step-registry/mco/conf/day2/add-mcoqe-robot-to-pull-secret/mco-conf-day2-add-mcoqe-robot-to-pull-secret-ref.yaml

The mco-conf-day2-add-mcoqe-robot-to-pull-secret step used
`from: upi-installer` which resolves to ocp_5.0_upi-installer (x86_64-only).
On arm64 jobs, multiarch-tuning-operator sets supportedArchitectures:{}
making the pod permanently unschedulable — no tests ever ran.

Fix:
- Switch from `from: upi-installer` to `from: cli` so the image is
  resolved from each job's own release payload at the correct architecture.
  This works for all release streams (4.19/4.20/4.21/4.22/5.0/5.1/main)
  without version pinning.
- Replace `jq -s '.[0] * .[1]'` with a python3 deep-merge (python3 is
  available in the cli image) to remove the jq dependency.

Confirmed: the same step runs fine in amd64 GCP jobs (e2e-gcp-mco-
disruptive-techpreview-3of3 succeeded in 3m22s), proving the script
logic is correct. Only the base image lacks arm64 support.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@openshift-merge-bot

Copy link
Copy Markdown
Contributor

[REHEARSALNOTIFIER]
@ptalgulk01: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
pull-ci-openshift-machine-config-operator-main-e2e-aws-mco-disruptive openshift/machine-config-operator presubmit Registry content changed
pull-ci-openshift-machine-config-operator-main-e2e-gcp-mco-disruptive openshift/machine-config-operator presubmit Registry content changed
pull-ci-openshift-machine-config-operator-release-5.1-e2e-aws-mco-disruptive openshift/machine-config-operator presubmit Registry content changed
pull-ci-openshift-machine-config-operator-release-5.1-e2e-gcp-mco-disruptive openshift/machine-config-operator presubmit Registry content changed
pull-ci-openshift-machine-config-operator-release-5.0-e2e-aws-mco-disruptive openshift/machine-config-operator presubmit Registry content changed
pull-ci-openshift-machine-config-operator-release-5.0-e2e-gcp-mco-disruptive openshift/machine-config-operator presubmit Registry content changed
pull-ci-openshift-machine-config-operator-release-4.23-e2e-aws-mco-disruptive openshift/machine-config-operator presubmit Registry content changed
pull-ci-openshift-machine-config-operator-release-4.23-e2e-gcp-mco-disruptive openshift/machine-config-operator presubmit Registry content changed
pull-ci-openshift-machine-config-operator-release-4.22-e2e-aws-mco-disruptive openshift/machine-config-operator presubmit Registry content changed
pull-ci-openshift-machine-config-operator-release-4.22-e2e-gcp-mco-disruptive openshift/machine-config-operator presubmit Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.21-amd64-nightly-aws-ipi-tp-ocl-f14 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.20-amd64-nightly-aws-ipi-longrun-mco-tp-ocl-p1-f14 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.20-amd64-nightly-gcp-ipi-longduration-tp-mco-p2-f14 N/A periodic Registry content changed
periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-metal-ipi-ovn-dualstack-mco-disruptive-techpreview-3of3 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.21-amd64-nightly-vsphere-ipi-longduration-mco-fips-proxy-p3-f14 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.21-amd64-nightly-metal-ds-ipi-ovn-f28-longrun-mco-p3 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.19-amd64-nightly-gcp-ipi-longduration-tp-mco-p3-f14 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.21-arm64-nightly-aws-ipi-longrun-mco-tp-proxy-fips-p2-f28 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.20-amd64-nightly-metal-ds-ipi-ovn-f14-longrun-mco-p1 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.19-amd64-nightly-metal-ds-ipi-ovn-f28-longrun-mco-p1 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-5.0-amd64-nightly-metal-ds-ipi-ovn-f14-longrun-mco-p2 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.21-amd64-nightly-vsphere-ipi-longduration-mco-fips-proxy-p2-f14 N/A periodic Registry content changed
periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-metal-ipi-ovn-dualstack-mco-disruptive-techpreview-1of3 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-5.0-amd64-nightly-aws-ipi-longrun-mco-tp-ocl-p2-f7 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.19-amd64-nightly-gcp-ipi-longduration-tp-mco-p1-f14 N/A periodic Registry content changed

A total of 328 jobs have been affected by this change. The above listing is non-exhaustive and limited to 25 jobs.

A full list of affected jobs can be found here

Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@ptalgulk01

Copy link
Copy Markdown
Contributor Author

/pj-rehearse periodic-ci-openshift-machine-config-operator-release-5.0-arm64-periodics-e2e-aws-mco-disruptive-1of2
periodic-ci-openshift-machine-config-operator-release-5.0-arm64-periodics-e2e-aws-mco-disruptive-2of2 periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-gcp-mco-disruptive-techpreview-3of3 periodic-ci-openshift-machine-config-operator-release-5.0-arm64-periodics-e2e-aws-mco-disruptive-techpreview-1of3

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

@ptalgulk01: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci

openshift-ci Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

@ptalgulk01: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/rehearse/periodic-ci-openshift-machine-config-operator-release-5.0-arm64-periodics-e2e-aws-mco-disruptive-1of2 037c491 link unknown /pj-rehearse periodic-ci-openshift-machine-config-operator-release-5.0-arm64-periodics-e2e-aws-mco-disruptive-1of2

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@ptalgulk01 ptalgulk01 closed this Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant