net, l2 bridge: Verify iface looping bug fix#4283
net, l2 bridge: Verify iface looping bug fix#4283nirdothan wants to merge 1 commit intoRedHatQE:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughSecondary VM networking now declares two bridged Multus interfaces ( Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
cdbba01 to
b53d4f5
Compare
|
Report bugs in Issues Welcome! 🎉This pull request will be automatically processed with the following features: 🔄 Automatic Actions
📋 Available CommandsPR Status Management
Review & Approval
Testing & Validation
Container Operations
Cherry-pick Operations
Label Management
✅ Merge RequirementsThis PR will be automatically approved when the following conditions are met:
📊 Review ProcessApprovers and ReviewersApprovers:
Reviewers:
Available Labels
AI Features
💡 Tips
For more information, please refer to the project documentation or contact the maintainers. |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/network/l2_bridge/vmi_interfaces_stability/lib_helpers.py`:
- Around line 32-37: The misplaced "# type: ignore" currently sits on its own
line and thus has no effect; move that comment to the end of the assignment
statement for spec.template.spec.domain.devices.interfaces so it sits on the
same line as the assignment (after the closing bracket) to properly suppress
type-checking for this Interface list created from base_vmspec(); ensure
Interface entries (e.g., Interface(name=LINUX_BRIDGE_IFACE_NAME_1, ...),
Interface(name="default", masquerade={}),
Interface(name=LINUX_BRIDGE_IFACE_NAME_2, ...)) remain unchanged.
In
`@tests/network/l2_bridge/vmi_interfaces_stability/test_interfaces_stability.py`:
- Around line 15-18: Update the pytest.mark.xfail reason to include the
underlying bug Jira reference CNV-79062 so it follows repo convention;
specifically modify the xfail decorator (the pytest.mark.xfail block referencing
QUARANTINED) to append or include "CNV-79062" in the reason string so the
quarantine marker and underlying bug ticket are both clearly referenced.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 12d95a09-39aa-40fd-a13e-99bb36638a0c
📒 Files selected for processing (2)
tests/network/l2_bridge/vmi_interfaces_stability/lib_helpers.pytests/network/l2_bridge/vmi_interfaces_stability/test_interfaces_stability.py
tests/network/l2_bridge/vmi_interfaces_stability/lib_helpers.py
Outdated
Show resolved
Hide resolved
tests/network/l2_bridge/vmi_interfaces_stability/test_interfaces_stability.py
Outdated
Show resolved
Hide resolved
b53d4f5 to
ef42461
Compare
|
Tested on 4.22 before the the fixed code was built: the test failed and the bug symptom was visible: |
1b8d938 to
6ec19bd
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/network/l2_bridge/vmi_interfaces_stability/lib_helpers.py`:
- Around line 122-140: Replace the dynamic primary-network lookup with the
explicit primary interface name "default": remove the call to
lookup_primary_network(vm=vm) and any variable primary_network, and change the
loop condition that now reads if iface.name == primary_network.name to if
iface.name == "default"; keep the existing calls to lookup_iface_status(vm=vm,
iface_name=iface.name) and the iface_to_ips mapping (LINUX_BRIDGE_IFACE_NAME_1,
LINUX_BRIDGE_IFACE_NAME_2) unchanged. Ensure any import or usage of
lookup_primary_network in this module is deleted (including its earlier use at
the top of the file) so the test no longer depends on that helper.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: b43a23e4-1d73-4f0e-9bb3-c73c98c41036
📒 Files selected for processing (2)
tests/network/l2_bridge/vmi_interfaces_stability/lib_helpers.pytests/network/l2_bridge/vmi_interfaces_stability/test_interfaces_stability.py
tests/network/l2_bridge/vmi_interfaces_stability/lib_helpers.py
Outdated
Show resolved
Hide resolved
tests/network/l2_bridge/vmi_interfaces_stability/lib_helpers.py
Outdated
Show resolved
Hide resolved
| ethernets["eth0"] = primary | ||
| ethernets["eth1"] = primary | ||
|
|
||
| ethernets["eth1"] = secondary_iface_cloud_init( | ||
| ethernets["eth0"] = secondary_iface_cloud_init( | ||
| ipv4_supported_cluster=ipv4_supported_cluster, | ||
| ipv6_supported_cluster=ipv6_supported_cluster, | ||
| host_address=1, | ||
| ) |
There was a problem hiding this comment.
We only need to add eth2 and other interfaces should stay as they are
There was a problem hiding this comment.
no. We also need to flip eth0/eth1
There was a problem hiding this comment.
But you already flipped them in lines 32-35
There was a problem hiding this comment.
No. We're recreating the failure scenario where eth0 was secondary.
EDIT: Wait maybe you're right. Trying.
EDIT 2: No. You've created the primary with eth0. it can't be eth0. It's the second interface therefore it's eth1, so the secondaries need to be eth0 and eth2.
There was a problem hiding this comment.
I'm not sure if this change is correct, what do you think? @EdDev
There was a problem hiding this comment.
Before I made that change the test failed because eth0 had no IP. Cloud init must explicitly ref eth0 and eth2 to configure the correct interfaces.
The test now passes so why is it incorrect?
There was a problem hiding this comment.
I confused primary interface with pod interface so it looked wrong to me
Now I understand it better
I suggest you elaborate about this setup order in the commit message so it would be clear
tests/network/l2_bridge/vmi_interfaces_stability/lib_helpers.py
Outdated
Show resolved
Hide resolved
tests/network/l2_bridge/vmi_interfaces_stability/lib_helpers.py
Outdated
Show resolved
Hide resolved
tests/network/l2_bridge/vmi_interfaces_stability/lib_helpers.py
Outdated
Show resolved
Hide resolved
tests/network/l2_bridge/vmi_interfaces_stability/lib_helpers.py
Outdated
Show resolved
Hide resolved
yossisegev
left a comment
There was a problem hiding this comment.
Please remove the "Temporarily quarantined" comment from the description, not that the fix reached D/S.
tests/network/l2_bridge/vmi_interfaces_stability/lib_helpers.py
Outdated
Show resolved
Hide resolved
tests/network/l2_bridge/vmi_interfaces_stability/lib_helpers.py
Outdated
Show resolved
Hide resolved
|
/verified =========== 2 passed, 1 deselected, 3 warnings in 1007.92s (0:16:47) =========== |
Test execution plan request detailsCRITICAL: You MUST post an inline review comment on the first changed line of the first file. As an expert software testing engineer, analyze all modified files in this PR and create a targeted test execution plan. Analysis Requirements:
Your deliverable: Test Execution Plan
Guidelines:
Hardware-Related Checks (SR-IOV, GPU, DPDK): When PR modifies fixtures for hardware-specific resources:
CRITICAL WORKFLOW COMPLETION RULES: When responding to this test execution plan request, you MUST follow these rules EXACTLY:
ABSOLUTE PROHIBITIONS (violating these creates empty/meaningless reviews):
Remember: The pull request review is visible to users. Additional comments are redundant noise. CRITICAL: Before setting "Run smoke tests: True", you MUST use shell scripts to: |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 3437 🏁 Script executed: Length of output: 886 🏁 Script executed: Length of output: 1786 🧠 Learnings used |
libs/net/ip.py
Outdated
|
|
||
|
|
||
| def has_expected_ips(iface_status: dict[str, list[str] | str], expected_ips: list[str]) -> bool: | ||
| return "guest-agent" in iface_status["infoSource"] and all( |
There was a problem hiding this comment.
I think you should either remove the guest-agent check, or rename the function - depending on what is the purpose of this function (checking that an interface generally reports a ready status? or checking that an interface specifically reports all the IP it is expected to have?).
There was a problem hiding this comment.
This is simply a refactor from the original code of @azhivovk that I was forced to do by the linter. I didn't mean to change anything about it, and I'm not sure why the linter didn't complain before. I'd rather keep the scope of this PR minimal if it's OK with you.
There was a problem hiding this comment.
This is simply a refactor from the original code of @azhivovk that I was forced to do by the linter.
How is this reasoning related to the comment?
I'd rather keep the scope of this PR minimal if it's OK with you.
You can keep it minimal and at the same time correct. The name is not reflecting what this is doing.
The signature is also odd, it should be like here.
What was the linter complain that you needed to do this?
There was a problem hiding this comment.
The code was as below and the linter complained that the lambda had no type or something like that.
lookup_iface_status(
vm=vm,
iface_name=iface.name,
predicate=lambda iface_status: (
"guest-agent" in iface_status["infoSource"]
and all(ip in iface_status.get("ipAddresses", []) for ip in secondary_ips)
),
)There was a problem hiding this comment.
@yossisegev I've renamed the function and removed the guest agent condition.
EdDev
left a comment
There was a problem hiding this comment.
Thanks.
and placing the interfaces in the order that
was known to produce the bug.
Please be specific, we can play trivia after this gets merged ;)
libs/net/ip.py
Outdated
|
|
||
|
|
||
| def has_expected_ips(iface_status: dict[str, list[str] | str], expected_ips: list[str]) -> bool: | ||
| return "guest-agent" in iface_status["infoSource"] and all( |
There was a problem hiding this comment.
This is simply a refactor from the original code of @azhivovk that I was forced to do by the linter.
How is this reasoning related to the comment?
I'd rather keep the scope of this PR minimal if it's OK with you.
You can keep it minimal and at the same time correct. The name is not reflecting what this is doing.
The signature is also odd, it should be like here.
What was the linter complain that you needed to do this?
libs/net/ip.py
Outdated
| return {ip.version for ip in actual_ips} == {ip.version for ip in expected_ips} | ||
|
|
||
|
|
||
| def has_all_ips_in_status(iface_status: ResourceField, expected_ips: list[str]) -> bool: |
There was a problem hiding this comment.
It does not fir here (libs/net/ip) because it adds knowledge about the VMI status entities and this lib is about IPs in general. Either pass only IPs or relocate to a more suitable lib.
Also, you have changed the logic from the original, dropping a condition.
Make sure to extract this "refactoring" into a separate PR or commit and reason for it well (with output of the error you had that required moving).
There was a problem hiding this comment.
Reverted the refactor. Back to the original lambda.
| predicate=lambda iface_status: ( | ||
| "guest-agent" in iface_status["infoSource"] | ||
| and all(ip in iface_status.get("ipAddresses", []) for ip in secondary_ips) | ||
| predicate=lambda iface_status: has_all_ips_in_status( |
There was a problem hiding this comment.
Removing the guest-agent check from the context of has_all_ips_in_status (the function formerly known as has_expected_ips) is correct IMO, but now it must be checked here (as before the PR), otherwise this will raise flakiness, and fail whenever has_all_ips_in_status is called too soon (i.e. before the guest-agent is updated in the interface's infoSource).
There was a problem hiding this comment.
I've reverted the refactor. However since we've decided to take it in a followup, I don't understand how this checking guest-agent makes a timing difference in validation.
There was a problem hiding this comment.
We saw many flaky tests in tier-2, where we eventually found that the problem was that we tried using VM's interfaces while they were not in a usable state yet, and further debugging revealed that the condition for an interface being ready is having the guest-agent entry in the infoSource.
Therefore, removing this verification introduces flakiness.
Validate fix of interface looping bug[1]. When primary interface was placed between secondaries, in the VMI template, it would disappear from status. The test was enhanced to have 2 secondary interfaces, and a primary in between. It validates that all 3 interfaces consistently show in the status without flickering. [1] https://issues.redhat.com/browse/CNV-79062 Signed-off-by: Nir Dothan <ndothan@redhat.com> Assisted-by: Claude Sonnet 4.5 <noreply@anthropic.com>
EdDev
left a comment
There was a problem hiding this comment.
Thanks.
I have only one comment left, see inline.
| ipv4_supported_cluster=ipv4_supported_cluster, | ||
| ipv6_supported_cluster=ipv6_supported_cluster, | ||
| ) | ||
| secondary_iface_to_ips = { |
Short description:
A bug was causing secondary interface to disappear from VMI status.
Validate that kubevirt/kubevirt#17041 had fixed the issue by enhancing the secondary interface stability test.
Summary by CodeRabbit