Skip to content

net: NAD change ref STP#74

Open
azhivovk wants to merge 1 commit intoRedHatQE:mainfrom
azhivovk:nad_change_stp
Open

net: NAD change ref STP#74
azhivovk wants to merge 1 commit intoRedHatQE:mainfrom
azhivovk:nad_change_stp

Conversation

@azhivovk
Copy link
Copy Markdown
Contributor

@azhivovk azhivovk commented Apr 14, 2026

STP Metadata

VEP issue: https://github.com/kubevirt/enhancements/blob/main/veps/sig-network/hotpluggable-nad-ref.md

What this PR does

STP for covering updating network (NAD) reference on live VMs.
This feature enables VM owners to change the secondary network of a running VM without rebooting. When the user updates the network reference in the VM configuration, the VM is reconnected to the new network (e.g., a different VLAN) without requiring a restart, preserving guest interface identity.
The supported scope is bridge binding on live-migratable VMs.

Summary by CodeRabbit

  • Documentation
    • Added a comprehensive test plan for the Live Update NetworkAttachmentDefinition feature: scope, terminology, acceptance criteria (no VM reboot, connectivity on new network, loss on previous network, preserved guest interface identity), known limitations/non-goals, prioritized test strategy (P0–P2), environment prerequisites, traceable test scenarios, QE review checklist, risk/untestable notes, upgrade expectations, and sign-offs.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 14, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 92e7ccb0-520f-45ca-983c-157270e1002a

📥 Commits

Reviewing files that changed from the base of the PR and between 4a38937 and 9c490d0.

📒 Files selected for processing (1)
  • stps/sig-network/hotpluggable-nad-ref.md

📝 Walkthrough

Walkthrough

Adds a new OpenShift Virtualization software test plan document for the "Live Update NetworkAttachmentDefinition Reference" initiative. The document defines metadata, terminology, feature behavior, QE checklists, prioritized test goals, environment prerequisites, entry criteria, risks/limitations, traceable test scenarios (VIRTSTRAT-560), and approvals. (301 added lines)

Changes

Cohort / File(s) Summary
New STP Document
stps/sig-network/hotpluggable-nad-ref.md
Introduces a comprehensive software test plan (+301 lines) for live updating a VM's secondary network reference via NAD changes. Covers feature definition and scope (bridge binding only), acceptance criteria (no reboot, connectivity changes, preserved guest interface identity), QE review checklist, prioritized test scenarios (P0–P2), test strategy, bare-metal environment prerequisites, entry criteria, risks/known limitations, and sign-off stakeholders.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'net: NAD change ref STP' directly and specifically summarizes the main change—adding a System Test Plan for NAD reference updates on live VMs, which is precisely what the 301-line document introduces.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-virtualization-qe-bot-5
Copy link
Copy Markdown

Report bugs in Issues

Welcome! 🎉

This pull request will be automatically processed with the following features:

🔄 Automatic Actions

  • Reviewer Assignment: Reviewers are automatically assigned based on the OWNERS file in the repository root
  • Size Labeling: PR size labels (XS, S, M, L, XL, XXL) are automatically applied based on changes
  • Issue Creation: A tracking issue is created for this PR and will be closed when the PR is merged or closed
  • Branch Labeling: Branch-specific labels are applied to track the target branch
  • Auto-verification: Auto-verified users have their PRs automatically marked as verified
  • Labels: Enabled categories: branch, can-be-merged, cherry-pick, has-conflicts, hold, needs-rebase, size, verified, wip

📋 Available Commands

PR Status Management

  • /wip - Mark PR as work in progress (adds WIP: prefix to title)
  • /wip cancel - Remove work in progress status
  • /hold - Block PR merging (approvers only)
  • /hold cancel - Unblock PR merging
  • /verified - Mark PR as verified
  • /verified cancel - Remove verification status
  • /reprocess - Trigger complete PR workflow reprocessing (useful if webhook failed or configuration changed)
  • /regenerate-welcome - Regenerate this welcome message

Review & Approval

  • /lgtm - Approve changes (looks good to me)
  • /approve - Approve PR (approvers only)
  • /assign-reviewers - Assign reviewers based on OWNERS file
  • /assign-reviewer @username - Assign specific reviewer
  • /check-can-merge - Check if PR meets merge requirements

Testing & Validation

  • /retest tox - Run Python test suite with tox
  • /retest all - Run all available tests

Cherry-pick Operations

  • /cherry-pick <branch> - Schedule cherry-pick to target branch when PR is merged
    • Multiple branches: /cherry-pick branch1 branch2 branch3

Label Management

  • /<label-name> - Add a label to the PR
  • /<label-name> cancel - Remove a label from the PR

✅ Merge Requirements

This PR will be automatically approved when the following conditions are met:

  1. Approval: /approve from at least one approver
  2. LGTM Count: Minimum 2 /lgtm from reviewers
  3. Status Checks: All required status checks must pass
  4. No Blockers: No wip, hold, has-conflicts labels and PR must be mergeable (no conflicts)

📊 Review Process

Approvers and Reviewers

Approvers:

  • EdDev

Reviewers:

  • Anatw
  • EdDev
  • azhivovk
  • servolkov
  • yossisegev
Available Labels
  • hold
  • verified
  • wip
  • lgtm
  • approve
AI Features
  • Cherry-Pick Conflict Resolution: Enabled (claude/claude-opus-4-6[1m])

💡 Tips

  • WIP Status: Use /wip when your PR is not ready for review
  • Verification: The verified label is removed on new commits unless the push is detected as a clean rebase
  • Cherry-picking: Cherry-pick labels are processed when the PR is merged
  • Permission Levels: Some commands require approver permissions
  • Auto-verified Users: Certain users have automatic verification and merge privileges

For more information, please refer to the project documentation or contact the maintainers.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@stps/sig-network/hotpluggable-nad-ref.md`:
- Around line 14-17: Remove the standard term definition "VEP = KubeVirt
Enhancement Proposal" from the "Document Conventions" section and leave only
feature-specific abbreviations (keep "NAD = NetworkAttachmentDefinition" and
"DNC = Dynamic Networks Controller"); update the "Document Conventions"
paragraph to define only NAD and DNC and remove the VEP line so the section
contains feature-specific terms only.
- Around line 97-99: Update the "Topology Considerations" section to explicitly
state cluster requirements to match the Technology Challenges and STP I.3
guidance: replace the "No known limitations when considering different
topologies" line with a clear statement that the feature requires a multi-node
cluster with live-migration support (CRI/migration features enabled), explicitly
excludes single-node clusters, and describe the testing impact (tests must run
on clusters with live migration enabled, sufficient nodes/resources for live
migration, and any shared storage or networking requirements). Ensure the
phrasing references the same feature constraints described in "Technology
Challenges" so the document is consistent.
- Line 10: The QE Owner entry "**QE Owner(s):** Asia Khromov" is missing the
required GitHub handle; update this metadata line to include the QE's GitHub
handle (e.g., change the "**QE Owner(s):** Asia Khromov" entry to include the
handle in parentheses or adjacent text such as "**QE Owner(s):** Asia Khromov
(`@github_handle`)") so the QE Owner(s) metadata contains both name and contact
information for traceability.
- Around line 68-78: Move the three items currently labeled "Explicit non-goal"
out of the Known Limitations section (I.2) and add them to the Out of Scope
(Testing Scope Exclusions) section (II.1): "Migrating between CNI types",
"Changing network binding/plugin", and "No limitation of migration retries
because of missing Network Attachment Definition"; for each, remove the bullet
from I.2 and create corresponding bullets under II.1 with a short rationale
"Explicit non-goal of the feature per VEP design" and the sign-off "Ronen
Sde-Or, 04/2026" (use the exact phrasing of each title to locate the items).
- Around line 41-43: The Acceptance Criteria currently has a single checklist
item; split it into multiple independent, verifiable list items under the
"Acceptance Criteria" header so each is a specific pass/fail condition. Replace
the single line "- *List the acceptance criteria:* Change the secondary network
reference on a running VM and verify the VM is connected to the new network and
has connectivity." with separate bullets such as verifying the reference change
applied, confirming the VM is attached to the new network, validating network
connectivity from the VM to an external endpoint, confirming no disruption to
primary network, and documenting rollback steps; ensure each new item is phrased
as an observable outcome and can be tested independently (search for the
"Acceptance Criteria" section / the existing checklist entry to update).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5354581e-a937-452c-985d-1031989697cc

📥 Commits

Reviewing files that changed from the base of the PR and between 424a4f9 and 9d41a3f.

📒 Files selected for processing (1)
  • stps/sig-network/hotpluggable-nad-ref.md

Comment thread stps/sig-network/hotpluggable-nad-ref.md Outdated
Comment thread stps/sig-network/hotpluggable-nad-ref.md Outdated
Comment thread stps/sig-network/hotpluggable-nad-ref.md
Comment thread stps/sig-network/hotpluggable-nad-ref.md
Comment thread stps/sig-network/hotpluggable-nad-ref.md
@azhivovk
Copy link
Copy Markdown
Contributor Author

azhivovk commented Apr 14, 2026

Change: apply bot's suggestions

@servolkov
Copy link
Copy Markdown
Contributor

@azhivovk it looks like it's worth adding the devs from your team to the OWNER list. Consider, please.

Copy link
Copy Markdown

@yossisegev yossisegev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Asia

Comment on lines +15 to +16
NAD = NetworkAttachmentDefinition, a Multus secondary network definition
DNC = Dynamic Networks Controller (Multus dynamic networks controller)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the 2 lines are supposed to be separated, but in preview it turns out

Document Conventions (if applicable): NAD = NetworkAttachmentDefinition, a Multus secondary network definition DNC = Dynamic Networks Controller (Multus dynamic networks controller)

Please search how to do that in markdown (
tag, double-space at the line-end, etc.).
Also note that this occurs in other places in the doc as well.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

### **Feature Overview**

This feature enables VM owners to change the secondary network of a running VM without rebooting. When the user updates the network reference in the VM configuration, the VM is reconnected to the new network (e.g., a different VLAN) without requiring a restart, preserving guest interface identity.
The supported scope is bridge binding on live-migratable VMs.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new-line issue

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

- The VM remains running (not rebooted) after the NAD reference update
- The VM establishes connectivity on the new secondary network
- The VM no longer has connectivity on the previous secondary network
- The guest interface identity (e.g., MAC address) is preserved
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant comma after "e.g.".

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


### **Feature Overview**

This feature enables VM owners to change the secondary network of a running VM without rebooting. When the user updates the network reference in the VM configuration, the VM is reconnected to the new network (e.g., a different VLAN) without requiring a restart, preserving guest interface identity.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overview doesn't mention the required migration.
On one hand it makes sense, because the migration part is an implementation detail; but OTOH, you do mention the required migration in other places in the STP (in the "Known Limitations", for example), so maybe it makes sense to mention it here, for example

Suggested change
This feature enables VM owners to change the secondary network of a running VM without rebooting. When the user updates the network reference in the VM configuration, the VM is reconnected to the new network (e.g., a different VLAN) without requiring a restart, preserving guest interface identity.
..., the VM is reconnected to the new network (e.g., a different VLAN) without requiring a restart (but rather migrating the VM), preserving guest interface identity.
sdfd

@frenzyfriday WDYT?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it makes sense to add the migration here. Though it is an implementation detail, the user right now has no other way to monitor the change than to check for the migration criteria

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, it's important to mention it in the overview since we mentioned the migration limitations


#### **3. Test Environment**

- **Cluster Topology:** 3-master/3-worker bare-metal (minimum 2 workers)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must it be bare-metal? Why?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, it's the default cluster setup we mention in the STP

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rnetser is this true?
And if so - why? We shouldn't be limited only to BM IMO.


- **OCP & OpenShift Virtualization Version(s):** OCP 4.22 with OpenShift Virtualization version 4.22

- **CPU Virtualization:** VT-x (Intel) or AMD-V enabled
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it won't be supported on ARM-based nodes?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC it's the equivalent of agnostic, as written in the stp template:

- **CPU Virtualization:** VT-x (Intel) or AMD-V enabled
  <!-- Change only if specific CPU requirements exist -->

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the template's default corresponds to the current RHEL nodes we have, which use AMD.
Now that multi-arch is about to be supported, I think that this section should also mention ARM.


- **Required Operators:** NMState Operator (for bridge interface configuration)

- **Platform:** Bare metal
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same - why necessarily BM?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to #74 (comment)

- *Details:* Not applicable; feature requires bare-metal cluster with live migration support.

#### **3. Test Environment**

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the feature is limited only to bridge-binding, as specified in the "Test Coverage" section below, then I think that multi-NIC nodes should also be mentioned here as a test environment requirement.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned it in
- **Network:** OVN-Kubernetes, IPv4, IPv6, secondary bridge network

Do I need to mention it in more places? Cluster topology maybe?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My opinion is that you should explicitly mention that multi NIC nodes are required, either in network or cluster topology.
However, maybe you think that this mention is clear and enough, and in that case - I won't insist.


- [x] **Risk:** Seamless network connectivity during the NAD swap is not guaranteed by design and cannot be tested as a pass/fail condition.
- **Mitigation:** Tests will verify connectivity before and after the swap; transient disruption during swap is accepted.
- *Alternative validation approach:* Verify connectivity is restored on new network after swap completes.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please be clear here, on whether the connectivity is expected to be seamlessly restored (meaning - a connection that was established before the NAD swap will disrupted during the change, and will eventually be restored without the user having to actively restore it), or rather the user will have to actively renew this connection.
If I understand the feature and its VEP correctly, then it's the second option.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC according to the VEP, the connectivity restoration is not seamlessly maintained and if the connectivity was not preserved after the NAD update, the user needs to manually renew the connections.
I added it here, thanks

- Security: no new RBAC or authentication changes.
- Monitoring/Observability: no new metrics or alerts required.
- Scalability: no new scale requirements.
- UI: no UI changes introduced by this feature.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as of now. i will talk to the UI team to get their opinions.

- *Test Scenario:* [Tier 2] Verify that a VM can be successfully live-migrated after completing a NAD reference update, and maintains connectivity on the new network.
- *Priority:* P2

- **[VIRTSTRAT-560]** — As a VM owner, I want my VM to remain connected to its updated secondary network after a cluster upgrade.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation did not add a breaking change. So is the cluster upgrade test necessary still?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it sounds like a scenario we should test, but maybe I'm wrong
@EdDev WDYT?


---

### **III. Test Scenarios & Traceability**
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a Ipv6 scenario somewhere? And should we add a test for VLAN change explicitly?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IPv6 is covered as a subtest within the existing scenarios, not a separate user story.
I plan to follow the way we create new tests: setup the VM networks according to existing cluster network stack - if we run on dual-stack cluster we test both IP families and if we run on single-stack cluster, we test accordingly.
Regarding the VLAN change - I edited the first test scenario

@azhivovk
Copy link
Copy Markdown
Contributor Author

Change:

  • New lines
  • Removed commas
  • Rephrase connectivity verification
  • Explicitly test VLAN change

coderabbitai[bot]
coderabbitai bot previously approved these changes Apr 15, 2026
STP for covering updating network (NAD) reference on live VMs.

Signed-off-by: Asia Khromov <azhivovk@redhat.com>
Co-Authored-By: Yossi Segev <ysegev@redhat.com>
@azhivovk
Copy link
Copy Markdown
Contributor Author

Change: add localnet scenario since bridge binding is used

- *Details:* Not applicable; feature requires bare-metal cluster with live migration support.

#### **3. Test Environment**

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My opinion is that you should explicitly mention that multi NIC nodes are required, either in network or cluster topology.
However, maybe you think that this mention is clear and enough, and in that case - I won't insist.


#### **3. Test Environment**

- **Cluster Topology:** 3-master/3-worker bare-metal (minimum 2 workers)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rnetser is this true?
And if so - why? We shouldn't be limited only to BM IMO.


- **OCP & OpenShift Virtualization Version(s):** OCP 4.22 with OpenShift Virtualization version 4.22

- **CPU Virtualization:** VT-x (Intel) or AMD-V enabled
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the template's default corresponds to the current RHEL nodes we have, which use AMD.
Now that multi-arch is about to be supported, I think that this section should also mention ARM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants