Skip to content

Add SBR disconnected periodic job config (ODF/NFS)#80427

Open
maximunited wants to merge 3 commits into
openshift:mainfrom
maximunited:rhwa-1039-sbr-disconnected-periodic
Open

Add SBR disconnected periodic job config (ODF/NFS)#80427
maximunited wants to merge 3 commits into
openshift:mainfrom
maximunited:rhwa-1039-sbr-disconnected-periodic

Conversation

@maximunited

@maximunited maximunited commented Jun 11, 2026

Copy link
Copy Markdown

Summary

  • Adds medik8s-sbr-nfs-bastion step: configures an NFS server on the disconnected bastion host and creates a StorageClass/PersistentVolume in the cluster. Uses soft NFS mount (soft,timeo=50) so storage loss surfaces as I/O errors rather than indefinite kernel retries — required for SBR fault detection to trigger correctly.
  • Adds medik8s-system-tests-main__4.22-disconnected.yaml with one weekly periodic job (e2e-sbr-weekly-aws-disconnected-nfs) using openshift-e2e-aws-disconnected workflow + medik8s-disconnected-catalogsource step (merged in Add medik8s disconnected CatalogSource step for air-gapped operator testing #79687).

Jira: https://issues.redhat.com/browse/RHWA-1039

Test plan

  • make validate-step-registry passes with new step
  • make jobs generates the periodic job entry in medik8s-system-tests-main-periodics.yaml
  • /rehearse the generated periodic job once CI is green

Summary by CodeRabbit

This PR updates the medik8s CI configuration to add a disconnected NFS-backed periodic job and a reusable CI step to provision NFS on a bastion for SBR (Storage-Based Remediation) tests.

What changed in practical terms

  • Adds a new weekly periodic test job e2e-sbr-weekly-aws-disconnected-nfs (cron: 0 8 * * 0 — Sundays 08:00 UTC) in ci-operator/config/medik8s/system-tests/medik8s-system-tests-main__4.22-disconnected.yaml. The job:

    • Runs the openshift-e2e-aws-disconnected workflow on OCP 4.22 with cluster_profile medik8s-aws.
    • Executes medik8s-disconnected-catalogsource and medik8s-operator-subscribe refs, then the new medik8s-sbr-nfs-bastion step, and finally runs make run-tests for the e2e test.
    • Sets test env (BASE_DOMAIN, COMPUTE_NODE_TYPE, ECO_TEST_FEATURES=sbr-operator, OO_CHANNEL, OPERATORS=storage-based-remediation) and per-test resource requests.
  • Adds a new reusable step medik8s-sbr-nfs-bastion (ci-operator/step-registry/medik8s/sbr/nfs-bastion/*):

    • medik8s-sbr-nfs-bastion-commands.sh: SSHs to the disconnected bastion using addresses/readme from SHARED_DIR and the cluster SSH key, configures /srv/nfs/sbr (777), writes an export with no_root_squash/no_subtree_check, enables nfs-server, and prints readiness.
    • Creates a StorageClass nfs-sbr (kubernetes.io/no-provisioner, Immediate binding, marked default) and a PersistentVolume nfs-sbr-pv (10Gi, ReadWriteMany, Retain) pointing to the bastion private IP and exported path.
    • Uses NFS mount options vers=4.1, soft and timeo=50 (soft,timeo=50 yields ~5s EIO on server loss) so storage outages surface as I/O errors instead of indefinite kernel retries — necessary for SBR fault detection.
  • Registers the step in the step registry (medik8s-sbr-nfs-bastion-ref.yaml) with 100m/200Mi requests and documentation; includes metadata JSON and OWNERS entries for review/approval.

Practical impact

  • Enables automated weekly SBR e2e testing in disconnected AWS environments where ODF is unavailable by providing a reproducible NFS RWX backend.
  • The NFS bastion step is reusable by other workflows needing a disconnected NFS export.
  • OWNERS files added to ensure appropriate reviewers/approvers for the new step.

Files of interest (added/updated)

  • ci-operator/config/medik8s/system-tests/medik8s-system-tests-main__4.22-disconnected.yaml
  • ci-operator/step-registry/medik8s/sbr/nfs-bastion/medik8s-sbr-nfs-bastion-commands.sh
  • ci-operator/step-registry/medik8s/sbr/nfs-bastion/medik8s-sbr-nfs-bastion-ref.yaml
  • ci-operator/step-registry/medik8s/sbr/nfs-bastion/medik8s-sbr-nfs-bastion-ref.metadata.json
  • ci-operator/step-registry/medik8s/sbr/nfs-bastion/OWNERS
  • ci-operator/step-registry/medik8s/sbr/OWNERS

Test checklist (author-provided)

  • make validate-step-registry passes with the new step
  • make jobs generates the periodic entry
  • Rehearse the generated periodic once CI is green

Adds a weekly periodic CI job that runs SBR (Storage-Based Remediation)
end-to-end tests in a disconnected AWS environment.

New step: medik8s-sbr-nfs-bastion
- Configures an NFS server on the disconnected bastion host
- Creates a StorageClass and PersistentVolume backed by the bastion NFS export
- Uses soft mount (soft,timeo=50) so storage loss surfaces as I/O errors
  rather than indefinite kernel retries, which is required for SBR fault
  detection to trigger correctly

New config: medik8s-system-tests-main__4.22-disconnected.yaml
- Workflow: openshift-e2e-aws-disconnected (disconnected VPC + bastion mirror)
- Cluster profile: medik8s-aws, m5.xlarge (no ODF overhead)
- Cron: Sunday 08:00 UTC, offset from connected job at 06:00
- Prerequisite: medik8s-disconnected-catalogsource (RHWA-840, merged PR openshift#79687)

RHWA-1039
@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: a5e0ad36-10b1-403b-a809-5acda838bb37

📥 Commits

Reviewing files that changed from the base of the PR and between 75a0ae4 and 0df3b6e.

📒 Files selected for processing (1)
  • ci-operator/step-registry/medik8s/sbr/nfs-bastion/medik8s-sbr-nfs-bastion-ref.metadata.json
✅ Files skipped from review due to trivial changes (1)
  • ci-operator/step-registry/medik8s/sbr/nfs-bastion/medik8s-sbr-nfs-bastion-ref.metadata.json

Walkthrough

Adds a medik8s SBR NFS-bastion step (ref, metadata, OWNERS), a Bash provisioning script to configure NFS on a disconnected bastion and create StorageClass/PersistentVolume, and registers a weekly 4.22-disconnected system test that uses the step.

Changes

NFS Bastion Storage Infrastructure for SBR

Layer / File(s) Summary
Step registry contract and ownership
ci-operator/step-registry/medik8s/sbr/nfs-bastion/medik8s-sbr-nfs-bastion-ref.yaml, ci-operator/step-registry/medik8s/sbr/nfs-bastion/medik8s-sbr-nfs-bastion-ref.metadata.json, ci-operator/step-registry/medik8s/sbr/nfs-bastion/OWNERS, ci-operator/step-registry/medik8s/sbr/OWNERS
Defines the medik8s-sbr-nfs-bastion step registry entry with source, command script, grace period, resource requests, metadata JSON, and OWNERS entries.
Bash script: env and SSH helper
ci-operator/step-registry/medik8s/sbr/nfs-bastion/medik8s-sbr-nfs-bastion-commands.sh
Adds strict Bash settings, reads bastion/SSH details from environment paths, and provides ssh_bastion to execute non-interactive SSH commands on the bastion.
Bash script: create exports and enable NFS
ci-operator/step-registry/medik8s/sbr/nfs-bastion/medik8s-sbr-nfs-bastion-commands.sh
Creates the NFS export directory, writes an /etc/exports.d entry with no_root_squash/no_subtree_check, enables and starts the NFS server, and logs readiness.
Bash script: create StorageClass and PersistentVolume
ci-operator/step-registry/medik8s/sbr/nfs-bastion/medik8s-sbr-nfs-bastion-commands.sh
Applies a StorageClass (nfs-sbr) with kubernetes.io/no-provisioner and an nfs-sbr-pv PersistentVolume pointing at the bastion export and verifies them via oc get.
Weekly disconnected AWS test registration
ci-operator/config/medik8s/system-tests/medik8s-system-tests-main__4.22-disconnected.yaml
Registers e2e-sbr-weekly-aws-disconnected-nfs for OCP 4.22 with intranet capability, weekly cron, medik8s-aws profile, environment variables, component refs, and resource overrides.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

lgtm, rehearsals-ack

🚥 Pre-merge checks | ✅ 14 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (14 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add SBR disconnected periodic job config (ODF/NFS)' accurately summarizes the main changes: adding a new disconnected periodic job configuration for SBR testing with NFS bastion setup.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed PR adds only CI configuration, shell scripts, and metadata files—no Ginkgo test definitions. All test step names are static and deterministic (e.g., 'e2e-sbr-weekly-aws-disconnected-nfs', 'medik8s-...
Test Structure And Quality ✅ Passed PR contains CI/CD configuration (YAML) and bash infrastructure scripts, not Ginkgo test code. The custom check is not applicable to this PR.
Microshift Test Compatibility ✅ Passed PR adds no Ginkgo e2e tests—only CI config, bash infrastructure scripts, and metadata files. Check for MicroShift-incompatible test APIs is not applicable.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No Ginkgo e2e test code is added in this PR. It only adds CI configuration and infrastructure setup scripts to run existing tests from the medik8s/system-tests repository.
Topology-Aware Scheduling Compatibility ✅ Passed PR adds CI configuration and a step that only creates StorageClass and PersistentVolume resources (no workloads), which don't support pod scheduling constraints.
Ote Binary Stdout Contract ✅ Passed PR adds only YAML CI config, bash script, and metadata files—no Go code or OTE binaries. The check applies to Go test executables, which this PR doesn't introduce.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed This PR does not add any Ginkgo e2e tests. It adds CI configuration, infrastructure setup scripts (Bash), and step registry references. The actual e2e tests run from the source repository via "make...
No-Weak-Crypto ✅ Passed No weak cryptography patterns detected. PR adds CI configuration files and NFS setup script with no cryptographic implementations, custom crypto, or timing-sensitive comparisons.
Container-Privileges ✅ Passed No privileged container settings found. Files contain CI config YAMLs, OWNERS metadata, and a bash script for remote NFS setup using sudo on bastion host, not in the test container.
No-Sensitive-Data-In-Logs ✅ Passed No passwords, tokens, API keys, PII, session IDs, or customer data are logged. SSH keys are properly protected. Only necessary infrastructure metadata (IP addresses, paths) are logged for CI operat...

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci

openshift-ci Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: maximunited

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot requested review from beekhof and razo7 June 11, 2026 16:01
@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 11, 2026
@maximunited

Copy link
Copy Markdown
Author

/pj-rehearse periodic-ci-medik8s-system-tests-main-4.22-disconnected-e2e-sbr-weekly-aws-disconnected-nfs

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

@maximunited: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

[REHEARSALNOTIFIER]
@maximunited: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
periodic-ci-medik8s-system-tests-main-4.22-disconnected-e2e-sbr-weekly-aws-disconnected-nfs N/A periodic Periodic changed
Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@maximunited

Copy link
Copy Markdown
Author

/pj-rehearse periodic-ci-medik8s-system-tests-main-4.22-disconnected-e2e-sbr-weekly-aws-disconnected-nfs

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

@maximunited: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci

openshift-ci Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

@maximunited: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/rehearse/periodic-ci-medik8s-system-tests-main-4.22-disconnected-e2e-sbr-weekly-aws-disconnected-nfs 0df3b6e link unknown /pj-rehearse periodic-ci-medik8s-system-tests-main-4.22-disconnected-e2e-sbr-weekly-aws-disconnected-nfs

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant