Skip to content

net, virt: RHCOS9 and RHCOS10 Worker Nodes Support#65

Open
azhivovk wants to merge 1 commit intoRedHatQE:mainfrom
azhivovk:rhel9_rhel10_nodes_arch
Open

net, virt: RHCOS9 and RHCOS10 Worker Nodes Support#65
azhivovk wants to merge 1 commit intoRedHatQE:mainfrom
azhivovk:rhel9_rhel10_nodes_arch

Conversation

@azhivovk
Copy link
Copy Markdown
Contributor

@azhivovk azhivovk commented Mar 30, 2026

Network STP aspect for virt's feature

STP Metadata

VEP issue: -

What this PR does

Network STP aspect for virt's feature for planning the necessary network tests for the feature

Special notes for your reviewer

Main virt STP is not ready but network aspect can already start

Summary by CodeRabbit

  • Documentation
    • Added a Software Test Plan for network-focused QE of heterogeneous OpenShift Virtualization clusters with mixed RHCOS 9 and RHCOS 10 workers.
    • Defines P0/P1 goals (bidirectional VM live migration without network disruption; Tier 2 validation on all-RHCOS10), scope boundaries, required topologies/environments, CI lane expectations, known limitations, risks, and traceable test scenarios.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 30, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: a7813bdc-24d9-4174-8f34-8f6838bbb4be

📥 Commits

Reviewing files that changed from the base of the PR and between 2da6b42 and 98362cb.

📒 Files selected for processing (1)
  • stps/sig-virt/heterogeneous-rhcos9-rhcos10/network.md

📝 Walkthrough

Walkthrough

A new Software Test Plan document was added for OpenShift Virtualization network testing on heterogeneous RHCOS 9/10 clusters, specifying QE metadata, P0/P1 goals for bidirectional VM live migration across multiple network configurations, scope boundaries, required environments, CI lane expectations, risks, and traceable CNV-77027 scenarios.
Severity: LOW — documentation-only change; no production code modified.

Changes

Cohort / File(s) Summary
OpenShift Virtualization Network STP
stps/sig-virt/heterogeneous-rhcos9-rhcos10/network.md
Added a 297-line STP describing QE metadata and tracking links, shared terminology, known limitations, explicit P0/P1 test goals (bidirectional VM live migration and Tier 2 validation), scoped test boundaries, required mixed vs all-RHCOS10 environments, software/hardware assumptions (VT-x/AMD-V, SR-IOV NICs, OVN-K, Primary UDN, localnet, SR-IOV, passt binding, Linux bridge), CI lane/entry criteria notes, risks/dependency gaps, and CNV-77027-mapped migration scenarios across pod network, Primary UDN, localnet, SR-IOV, passt binding, and Linux bridge.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'net, virt: RHCOS9 and RHCOS10 Worker Nodes Support' is clear and directly related to the changeset, which adds a Network STP document for testing heterogeneous clusters with mixed RHCOS 9 and RHCOS 10 worker nodes.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-virtualization-qe-bot-3
Copy link
Copy Markdown

Report bugs in Issues

Welcome! 🎉

This pull request will be automatically processed with the following features:

🔄 Automatic Actions

  • Reviewer Assignment: Reviewers are automatically assigned based on the OWNERS file in the repository root
  • Size Labeling: PR size labels (XS, S, M, L, XL, XXL) are automatically applied based on changes
  • Issue Creation: A tracking issue is created for this PR and will be closed when the PR is merged or closed
  • Branch Labeling: Branch-specific labels are applied to track the target branch
  • Auto-verification: Auto-verified users have their PRs automatically marked as verified
  • Labels: Enabled categories: branch, can-be-merged, cherry-pick, has-conflicts, hold, needs-rebase, size, verified, wip

📋 Available Commands

PR Status Management

  • /wip - Mark PR as work in progress (adds WIP: prefix to title)
  • /wip cancel - Remove work in progress status
  • /hold - Block PR merging (approvers only)
  • /hold cancel - Unblock PR merging
  • /verified - Mark PR as verified
  • /verified cancel - Remove verification status
  • /reprocess - Trigger complete PR workflow reprocessing (useful if webhook failed or configuration changed)
  • /regenerate-welcome - Regenerate this welcome message

Review & Approval

  • /lgtm - Approve changes (looks good to me)
  • /approve - Approve PR (approvers only)
  • /assign-reviewers - Assign reviewers based on OWNERS file
  • /assign-reviewer @username - Assign specific reviewer
  • /check-can-merge - Check if PR meets merge requirements

Testing & Validation

  • /retest tox - Run Python test suite with tox
  • /retest all - Run all available tests

Cherry-pick Operations

  • /cherry-pick <branch> - Schedule cherry-pick to target branch when PR is merged
    • Multiple branches: /cherry-pick branch1 branch2 branch3

Label Management

  • /<label-name> - Add a label to the PR
  • /<label-name> cancel - Remove a label from the PR

✅ Merge Requirements

This PR will be automatically approved when the following conditions are met:

  1. Approval: /approve from at least one approver
  2. LGTM Count: Minimum 2 /lgtm from reviewers
  3. Status Checks: All required status checks must pass
  4. No Blockers: No wip, hold, has-conflicts labels and PR must be mergeable (no conflicts)

📊 Review Process

Approvers and Reviewers

Approvers:

  • EdDev

Reviewers:

  • Anatw
  • EdDev
  • azhivovk
  • servolkov
  • yossisegev
Available Labels
  • hold
  • verified
  • wip
  • lgtm
  • approve
AI Features
  • Cherry-Pick Conflict Resolution: Enabled (claude/claude-opus-4-6[1m])

💡 Tips

  • WIP Status: Use /wip when your PR is not ready for review
  • Verification: The verified label is automatically removed on each new commit
  • Cherry-picking: Cherry-pick labels are processed when the PR is merged
  • Permission Levels: Some commands require approver permissions
  • Auto-verified Users: Certain users have automatic verification and merge privileges

For more information, please refer to the project documentation or contact the maintainers.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@stps/sig-network/rhcos9-rhcos10-nodes-net-support.md`:
- Line 123: Update the inconsistent "Cluster Topology: Agnostic" label to
reflect the actual required topologies used by the plan: replace the "Cluster
Topology: Agnostic" occurrences (including the instances around the lines
referencing 68-69, 88-90, and 123) with explicit wording such as "Cluster
Topology: Mixed RHCOS9 + RHCOS10, and All-RHCOS10" (or two separate entries) so
the document matches the stated test requirements; search for the exact string
"Cluster Topology:" in the file and change the value accordingly wherever it
currently reads "Agnostic".
- Line 194: The mitigation sentence "- **Mitigation:** Proceed with the
network-scoped STP independently; sync the main STP is available to identify and
resolve any overlaps or gaps." is grammatically broken; update the line (the
mitigation string) to read clearly, e.g. "Proceed with the network-scoped STP
independently; synchronize with the main STP when it becomes available to
identify and resolve any overlaps or gaps." Replace the existing mitigation
sentence in the document (the line starting with "- **Mitigation:** Proceed with
the network-scoped STP independently;") with this corrected phrasing.
- Line 7: The "Enhancement(s): -" metadata entry should be replaced with a
concrete reference to the related HLD or enhancement PR; edit the
"Enhancement(s)" field in stps/sig-network/rhcos9-rhcos10-nodes-net-support.md
(the line containing **Enhancement(s):**) and change "-" to a markdown link to
the HLD or enhancement PR (or the HLD if no PR exists) so the STP metadata is
traceable and auditable.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6675e2f8-988e-48f0-b314-79b2acdde433

📥 Commits

Reviewing files that changed from the base of the PR and between 9985ce4 and 122e6bf.

📒 Files selected for processing (1)
  • stps/sig-network/rhcos9-rhcos10-nodes-net-support.md

Comment thread stps/sig-network/rhcos9-rhcos10-nodes-net-support.md Outdated
Comment thread stps/sig-network/rhcos9-rhcos10-nodes-net-support.md Outdated
Comment thread stps/sig-network/rhcos9-rhcos10-nodes-net-support.md Outdated
@azhivovk
Copy link
Copy Markdown
Contributor Author

azhivovk commented Mar 30, 2026

Change: Apply bot's suggestions

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
stps/sig-network/rhcos9-rhcos10-nodes-net-support.md (1)

173-181: Normalize terminology formatting for consistency.

The document mixes Tier2/Tier 2 and RHCOS10/RHCOS 10. Please standardize these terms to avoid ambiguity in test planning docs.

Suggested wording cleanup
- - [x] **Risk:** Not testing Tier2 network tests on heterogeneous cluster.
+ - [x] **Risk:** Not testing Tier 2 network tests on heterogeneous cluster.
...
-   - *Missing resources or infrastructure:* No RHCOS10 clusters available for network Tier2 regression.
+   - *Missing resources or infrastructure:* No RHCOS 10 clusters available for network Tier 2 regression.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stps/sig-network/rhcos9-rhcos10-nodes-net-support.md` around lines 173 - 181,
Standardize terminology formatting across the document by replacing inconsistent
instances of "Tier2" with "Tier 2" and "RHCOS10" with "RHCOS 10" (e.g., within
the "Risk" and "Test Environment" sections and checklist items) so all
references use the spaced form; update every occurrence of these tokens
(including checklist lines like "Not testing Tier2 network tests" and "Dedicated
all-RHCOS 10 cluster" wording) to the chosen normalized forms for consistency.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@stps/sig-network/rhcos9-rhcos10-nodes-net-support.md`:
- Around line 173-181: Standardize terminology formatting across the document by
replacing inconsistent instances of "Tier2" with "Tier 2" and "RHCOS10" with
"RHCOS 10" (e.g., within the "Risk" and "Test Environment" sections and
checklist items) so all references use the spaced form; update every occurrence
of these tokens (including checklist lines like "Not testing Tier2 network
tests" and "Dedicated all-RHCOS 10 cluster" wording) to the chosen normalized
forms for consistency.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 04eac364-7067-415b-b792-ee8b4cc7728a

📥 Commits

Reviewing files that changed from the base of the PR and between 122e6bf and 92a8885.

📒 Files selected for processing (1)
  • stps/sig-network/rhcos9-rhcos10-nodes-net-support.md

Copy link
Copy Markdown
Collaborator

@EdDev EdDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new revised rules have been merged recently. I suggest you rebase and post an updated content based on it. Coderabbit should kick in as well (but you can push it in a god share already with local treatment in advance).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a sig-virt owned feature, please relocate under their folder.
Also, I would start with placing it under a folder, like heterogeneous-rhcos9-rhcos10 folder (the file name being something like network.md).

- *List the key D/S requirements reviewed:* VM live migration must work without network disruption in both directions on a mixed-node (RHEL 9 + RHEL 10) cluster, and overall network functionality must work on an all-RHEL 10 cluster.

- [x] **Understand Value and Customer Use Cases**
- *Describe the feature's value to customers:* Customers can migrate their clusters from RHEL 9 to RHEL 10 nodes without service disruption, maintaining VM network connectivity and live migration capability throughout the transition.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- *Describe the feature's value to customers:* Customers can migrate their clusters from RHEL 9 to RHEL 10 nodes without service disruption, maintaining VM network connectivity and live migration capability throughout the transition.
- *Describe the feature's value to customers:* Customers can upgrade their clusters from RHEL 9 to RHEL 10 nodes, gradually, without service disruption, maintaining VM network connectivity and live migration capability throughout the transition.

@azhivovk
Copy link
Copy Markdown
Contributor Author

azhivovk commented Apr 13, 2026

Change: rebase as new STP format and review were merged, and fix all STP accordingly

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@stps/sig-virt/heterogeneous-rhcos9-rhcos10/network.md`:
- Line 10: Update the "QE Owner(s)" metadata entry to include contact
information for the listed owner—replace or augment the existing line "QE
Owner(s): Asia Khromov" with a name plus at least one contact (e.g., GitHub
handle, email, or Slack handle). Ensure the change is made in the "QE Owner(s)"
field in network.md so it reads like "QE Owner(s): Asia Khromov (github:
`@username`, email: name@example.com)" or similar per STP Metadata guidelines.
- Line 174: Update Section II.3 to replace the generic "Storage: Agnostic" with
the specific storage class(es) used by the test environments (e.g., "Storage:
gp2, ebs-sc", or "StorageClass: local-path, ceph-rbd" as appropriate) and also
state the platform type (e.g., "Platform: Bare metal" or "Platform: AWS");
locate the Section II.3 Test Environment block that currently contains "Storage:
Agnostic" and modify the storage line to list exact StorageClass names and add a
Platform line so the Test Environment follows the STP guideline requiring
specific storage class and platform.
- Line 187: Remove the non-actionable line "**Test Framework:** Standard" in
section II.3.1 (the literal string from the diff) and either delete the field
entirely or replace it with a concrete non-standard/new testing tool name(s)
used (e.g., a specific proprietary harness or custom framework); ensure II.3.1
lists only NEW or non‑standard tools per the STP guideline.
- Around line 95-97: The P1 testing goal ("P1 Validate that all existing network
features work correctly for VMs running on an all-RHCOS 10 cluster") in the
goals list lacks traceable scenario coverage in Section III; either add explicit
scenario(s) in Section III that map this P1 goal to a Jira requirement (include
Jira ID, Tier, Priority and a short scenario title) or tighten the P1 wording to
only cover scenarios already listed; update the scenarios referenced around
"Section III" and the block covering lines 261-283 to include the new
all-RHCOS10 scenario(s) with their Jira IDs so every Testing Goal/Acceptance
Criterion is traceable.
- Around line 95-97: Add at least one explicit negative/edge-case migration
scenario to the existing VM migration goals (the **[P0]** and **[P1]** bullets)
that covers failure/partial-failure and concurrency behavior — for example:
describe a forced interruption during live migration from RHCOS 9 → RHCOS 10
(and back), expected rollback semantics, expected network connectivity
degradation thresholds, and steps to reproduce including simultaneous VM
migrations and induced packet loss; ensure the new scenario references the
original **[P0]** goal and the rollback/interrupt expectations are mirrored for
the reverse migration and noted to apply across “all supported network
configurations” (also add similar negative-case entries near lines 261-283 as
indicated).
- Around line 17-20: The definitions use internal-implementation terms; update
the lines referencing UDN, SR-IOV, and OVN-K to be user-facing: describe UDN as
a project-level "User Defined Network" feature that lets teams define custom
networks, SR-IOV as a hardware capability that exposes multiple virtual NICs
from a single physical NIC for improved performance, and OVN-K as the cluster
network solution used to provide pod-to-pod networking — remove mentions of
"CRD", "CNI provider", and other API/component names so the text focuses on
user-visible behavior and outcomes.
- Line 7: Replace the literal line "Enhancement(s): N/A — no VEP exists for this
feature." in the document (the line beginning with "Enhancement(s):") with a
proper VEP/design/HLD link: if no VEP exists, insert the canonical HLD URL (full
absolute link) and a short title (e.g., "HLD: <Title> — <URL>") so the field
points to a concrete design document rather than "N/A".
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 20a80727-33d4-4c73-9358-9b8948920738

📥 Commits

Reviewing files that changed from the base of the PR and between 92a8885 and 833f2d3.

📒 Files selected for processing (1)
  • stps/sig-virt/heterogeneous-rhcos9-rhcos10/network.md

Comment thread stps/sig-virt/heterogeneous-rhcos9-rhcos10/network.md Outdated
Comment thread stps/sig-virt/heterogeneous-rhcos9-rhcos10/network.md Outdated
Comment thread stps/sig-virt/heterogeneous-rhcos9-rhcos10/network.md Outdated
Comment thread stps/sig-virt/heterogeneous-rhcos9-rhcos10/network.md
Comment thread stps/sig-virt/heterogeneous-rhcos9-rhcos10/network.md Outdated
Comment thread stps/sig-virt/heterogeneous-rhcos9-rhcos10/network.md Outdated
@azhivovk
Copy link
Copy Markdown
Contributor Author

Change: apply bot's suggestions

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@stps/sig-virt/heterogeneous-rhcos9-rhcos10/network.md`:
- Around line 216-226: Replace all occurrences of the string "Tier2" with "Tier
2" in the network.md content (notably in the bullet lines containing "**Risk:**
Not testing Tier2 network tests" and "*Missing resources or infrastructure:* No
RHCOS10 clusters available for network Tier2 regression") so the document
consistently uses "Tier 2" throughout; update any related inline occurrences in
the Risk/Mitigation/Sign-off bullets so searching and formatting are uniform.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4d127397-ddfd-4e83-b738-b40fb1440cac

📥 Commits

Reviewing files that changed from the base of the PR and between 833f2d3 and 2da6b42.

📒 Files selected for processing (1)
  • stps/sig-virt/heterogeneous-rhcos9-rhcos10/network.md

Comment thread stps/sig-virt/heterogeneous-rhcos9-rhcos10/network.md Outdated
Network STP aspect for virt's feature

Signed-off-by: Asia Khromov <azhivovk@redhat.com>
@azhivovk azhivovk force-pushed the rhel9_rhel10_nodes_arch branch from 2da6b42 to 98362cb Compare April 13, 2026 11:29
@azhivovk
Copy link
Copy Markdown
Contributor Author

Change: Tier2 -> Tier 2

Copy link
Copy Markdown

@nirdothan nirdothan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @azhivovk. Thw plan looks good to me.
This is my first STP review and I'm not sure what is important vs nits. Please see my comments below.

**Document Conventions (if applicable):**
RHCOS = Red Hat CoreOS, the immutable container-optimized OS used for OpenShift worker nodes
dual-stream cluster = a heterogeneous cluster running both RHCOS 9 and RHCOS 10 worker nodes
UDN = User Defined Network, a feature that lets teams define custom overlay networks at the project level
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is a standard definition for UDN, but if not, I would not call it a "feature" and say that it lets "teams" do things. Please respond if you'd like a suggestion and if it's standard then you can ignore.

- *Note any gaps or missing criteria:* None

- [x] **Non-Functional Requirements (NFRs)**
- *List applicable NFRs and their targets:* Coverage: all existing Tier 2 network tests pass on an all-RHCOS 10 cluster.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that it's a non functional requirement.

- [x] **Non-Functional Requirements (NFRs)**
- *List applicable NFRs and their targets:* Coverage: all existing Tier 2 network tests pass on an all-RHCOS 10 cluster.
- *Note any NFRs not covered and why:*
- Performance: no new performance requirements introduced; migration timing is validated qualitatively via connectivity checks. Deferred to main STP.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"migration timing is validated qualitatively via connectivity checks"
I'm not sure what this means but it seems functional.

#### **3. Technology and Design Review**

- [x] **Developer Handoff/QE Kickoff**
- *Key takeaways and concerns:* New migration tests are needed for the heterogeneous (RHCOS 9 + RHCOS 10) cluster topology; existing Tier 2 network regression must run against an all-RHCOS 10 cluster to validate compatibility.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: separate the 2 takeways into separate bullets.

- *Test Scenario:* [Tier 2] Verify VM with SR-IOV network live migrates between RHCOS 9 and RHCOS 10 nodes (in both directions) without connectivity loss.
- *Priority:* P0

- **[CNV-77027]** — As a cluster admin, I want VMs using passt binding to live migrate across RHCOS 9 and RHCOS 10 nodes without network disruption.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passt is still TP, consider deferring this use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants