OCPBUGS-86311: fix: validate agent-config interface names match networkConfig#10567
Conversation
openshift-install agent does not cross-validate that interface names in hosts[].interfaces[] match the names used in hosts[].networkConfig. When names mismatch, the pre-network-manager-config.sh script silently fails to rename .nmconnection files at boot time, causing complete network failure for bond/VLAN/bridge topologies with no diagnostic. Add validateInterfaceNamesMatchNetworkConfig() to validateAgentHosts() that ensures every interfaces[].name exists in the networkConfig interfaces list. The error message lists valid networkConfig names to guide users toward the correct configuration. Only the agent-config.yaml path is affected; install-config.yaml derives interface names from networkConfig automatically, so names always match. Co-authored-by: Cursor <cursoragent@cursor.com>
|
Note
|
| Layer / File(s) | Summary |
|---|---|
AgentHosts Generate wiring and warning pkg/asset/agent/agentconfig/agenthosts.go |
After appending hosts from agentConfig.Config.Hosts (install) or addNodesConfig.Config.Hosts (add-nodes), call warnInterfaceNamesNotInNetworkConfig for each host; add helper that parses host.NetworkConfig.Raw as NMState interfaces[].name and Warnfs for any non-empty host interface Name not present in that set (silently ignores YAML unmarshal errors and no-ops on empty inputs). |
Tests, fixtures and helpers pkg/asset/agent/agentconfig/agenthosts_test.go |
Add agentNetworkConfigBond fixture; switch generated host interface names to eth0; extend TestAgentHosts_Generate with match/mismatch and bonded cases plus install/add-nodes scenarios; add helpers getAgentConfigBondMatching, getAgentConfigBondMismatched, getAgentConfigMismatchedInterfaceName, getAgentConfigMatchingInterfaceName, HostBuilder.rawNetworkConfig, and edge-case generators for empty and malformed network config. |
🎯 3 (Moderate) | ⏱️ ~20 minutes
🚥 Pre-merge checks | ✅ 14 | ❌ 1
❌ Failed checks (1 warning)
| Check name | Status | Explanation | Resolution |
|---|---|---|---|
| Docstring Coverage | Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. | Write docstrings for the functions missing them to satisfy the coverage threshold. |
✅ Passed checks (14 passed)
| Check name | Status | Explanation |
|---|---|---|
| Description Check | ✅ Passed | Check skipped - CodeRabbit’s high-level summary is enabled. |
| Title check | ✅ Passed | The title clearly and specifically describes the main change: validating that agent-config interface names match networkConfig names, addressing OCPBUGS-86311. |
| Linked Issues check | ✅ Passed | Check skipped because no linked issues were found for this pull request. |
| Out of Scope Changes check | ✅ Passed | Check skipped because no linked issues were found for this pull request. |
| Stable And Deterministic Test Names | ✅ Passed | All 29 test names in agenthosts_test.go are static, descriptive strings with no dynamic content like UUIDs, timestamps, IPs, or generated identifiers. Uses standard Go testing (t.Run), not Ginkgo. |
| Test Structure And Quality | ✅ Passed | The test file uses standard Go table-driven tests with testing.T, not Ginkgo. The custom check applies only to Ginkgo tests, which this PR doesn't contain. |
| Microshift Test Compatibility | ✅ Passed | PR modifies only standard Go unit tests (pkg/asset/agent/agentconfig/agenthosts_test.go), not Ginkgo e2e tests. Check is not applicable. |
| Single Node Openshift (Sno) Test Compatibility | ✅ Passed | PR adds only standard Go unit tests to pkg/asset/agent/agentconfig/agenthosts_test.go, not Ginkgo e2e tests. SNO compatibility check only applies to Ginkgo e2e tests. |
| Topology-Aware Scheduling Compatibility | ✅ Passed | PR modifies installer provisioning config validation, not deployment manifests or scheduling constraints. No pod affinity, topology spread, or replica topology-dependence introduced. |
| Ote Binary Stdout Contract | ✅ Passed | PR modifies agent-config asset generation code, not an OTE test binary. Uses logrus (writes to stderr by default), no fmt.Print or klog to stdout. No OTE stdout contract violations. |
| Ipv6 And Disconnected Network Test Compatibility | ✅ Passed | PR adds only standard Go unit tests (TestAgentHosts_Generate), not Ginkgo e2e tests. Check is not applicable. |
| No-Weak-Crypto | ✅ Passed | No weak crypto patterns found. Changes add configuration validation using YAML unmarshaling and basic string comparison only. |
| Container-Privileges | ✅ Passed | PR modifies only Go source files for network interface validation logic; no container/K8s manifests, privilege escalation settings, or privileged container configurations are present or modified. |
| No-Sensitive-Data-In-Logs | ✅ Passed | The new warning function logs only interface names (e.g., eth0, bond0) and interface lists from networkConfig - non-sensitive network identifiers. Sensitive data like BMC credentials are not logged. |
✏️ Tip: You can configure your own custom pre-merge checks in the settings.
✨ Finishing Touches
🧪 Generate unit tests (beta)
- Create PR with unit tests
Warning
There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.
🔧 golangci-lint (2.12.2)
Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
Comment @coderabbitai help to get the list of available commands and usage tips.
|
Hi @chdeshpa-hue. Thanks for your PR. I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@chdeshpa-hue: This pull request references Jira Issue OCPBUGS-86311, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/jira refresh |
|
@chdeshpa-hue: This pull request references Jira Issue OCPBUGS-86311, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
| if !ncNames[iface.Name] { | ||
| errMsg := "interface name \"" + iface.Name + "\" not found in networkConfig interfaces [" + strings.Join(ncNameList, ", ") + "]; " + | ||
| "the interfaces[].name values are logical names that must match the interface names used in networkConfig " + | ||
| "so that the MAC-to-interface mapping works correctly at boot time" |
There was a problem hiding this comment.
I don't think this is true. Or rather, it is true that the names need to match for the MAC-to-interface mapping to work. But if the interface name correctly matches the one defined by the kernel (or if the nmstate uses identifier: mac-address), then you don't need the MAC-to-interface mapping in order for it to choose the right interface.
And in fact there is at least one important case where we rely on this: when using an unmodified baremetal IPI install-config to do an agent install without an agent-config.yaml. Baremetal IPI does not support a MAC-to-interface mapping, so the input must always match up to the true interface names. It does, however, require providing one MAC address to identify the host, and so when we internally generate the host list we just use a bogus name for that interface that doesn't match the ones in the nmstate config. This change will break that feature.
There was a problem hiding this comment.
Thanks for the thorough review @zaneb — both points are well taken.
You're right that the baremetal IPI → agent path (where getInstallConfigDefaults generates the fallback "boot" interface name at L276-280) would be broken by this validation. I missed that topology entirely — the validation assumed all interfaces[] entries are user-provided, which isn't true for that flow.
And I appreciate the clarification on the nmstate-as-opaque-blob principle. I can see how relying on parsing its internal structure creates a fragile coupling.
Given these constraints, would you be open to a narrower alternative?
Option A: Move the diagnostic to the boot script itself
Enhance pre-network-manager-config.sh to emit a clear error when sed finds zero matches during the rename step — something like "WARNING: interface 'foo' from agent-config not found in generated .nmconnection files". This keeps the installer from parsing nmstate at all and catches the failure at the point where it actually matters.
Option B: Warn-only at build time, scoped to agent-config.yaml path
Only run the check when interfaces[] comes from a user-provided agent-config.yaml (not from getInstallConfigDefaults), and emit a warning instead of a hard error. This still gives users early feedback for the common bond/VLAN misconfiguration case without blocking the baremetal IPI path.
The underlying problem we're solving is that bond/VLAN/bridge topologies silently get zero connectivity when names mismatch, and users get no useful diagnostic. Either option would address that without violating the design principles. Happy to rework the PR if either direction seems reasonable to you.
There was a problem hiding this comment.
Yes, I like the Option B proposal.
Warning instead of error, and keeping it on the agent-config path rather than after data from install-config and agent-config are combined, would address my main concerns.
Better if we continue to treat the NMState as opaque and get the info we want from the keyfiles, but given that we are already not following this principle to some extent and that there will only be a warning instead of an error, I would not block on that.
There was a problem hiding this comment.
Thanks @zaneb — I've updated the PR to implement your suggested approach:
-
Warning instead of error — uses
logrus.Warnfso it never blocks legitimate configs (e.g.identifier: mac-addresscases) -
Scoped to user-authored hosts only — the
hostsFromAgentConfigguard ensures it only fires for:agent-config.yamlhosts (Install workflow)nodes-config.yamlhosts (oc adm node-image create/ AddNodes workflow — ref OCPBUGS-86420)
It never fires for the
getInstallConfigDefaultspath (where the synthetic"boot"interface name is generated from baremetal IPI install-config) -
NMState treated as opaque — the check only extracts interface names (top-level
interfaces[].name), consistent with the existing parsing already in the file. No deeper structural assumptions.
Test coverage includes the inert install-config path, AddNodes mismatch/match, empty interface names, and malformed networkConfig graceful handling.
| } | ||
|
|
||
| var netInterfaces nmStateInterface | ||
| if err := yaml.Unmarshal(host.NetworkConfig.Raw, &netInterfaces); err != nil { |
There was a problem hiding this comment.
It was part of our design principles that we treat nmstate as an opaque blob and not rely on knowing the internal structure of it, which may change over time.
Addresses @zaneb's review: the interface name cross-check against networkConfig is now a warning (logrus.Warnf) instead of a hard error, and runs for both agent-config.yaml and nodes-config.yaml (oc adm node-image create) paths — but never for install-config baremetal hosts where getInstallConfigDefaults generates synthetic interface names. This ensures the baremetal IPI fallback path (which generates a bogus "boot" interface name) is never affected, while giving users early visibility into potential name mismatches that could cause connectivity failures at boot time. Test coverage added for: - install-config inert path (no warning fires) - AddNodes workflow mismatch (warns) and match (no warning) - empty interface name (skipped gracefully) - malformed networkConfig YAML (no panic) - bond interfaces matching and mismatching Ref: OCPBUGS-86420 Co-authored-by: Cursor <cursoragent@cursor.com>
|
@chdeshpa-hue: This pull request references Jira Issue OCPBUGS-86311, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
pkg/asset/agent/agentconfig/agenthosts.go (1)
182-217: 💤 Low valueConsider deduplicating warnings per host.
The warning logic emits one
Warnfper mismatched interface. If a host has multiple mismatched interfaces, this produces multiple log lines with the same networkConfig list. Consider collecting all mismatched names and emitting a single warning per host.📋 Example refactor to deduplicate warnings
+ var mismatched []string for _, iface := range host.Interfaces { if iface.Name == "" { continue } if !ncNames[iface.Name] { - logrus.Warnf("agent-config: interface name %q not found in networkConfig interfaces %v; "+ - "connectivity may fail if interface names do not match at boot time", - iface.Name, ncNameList) + mismatched = append(mismatched, iface.Name) } } + if len(mismatched) > 0 { + logrus.Warnf("agent-config: interface names %v not found in networkConfig interfaces %v; "+ + "connectivity may fail if interface names do not match at boot time", + mismatched, ncNameList) + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/asset/agent/agentconfig/agenthosts.go` around lines 182 - 217, In warnInterfaceNamesNotInNetworkConfig, instead of calling logrus.Warnf for each mismatched iface, collect mismatched interface names (e.g. into a slice like mismatchedNames) while iterating host.Interfaces (skip empty names and use ncNames to check membership), and after the loop emit a single logrus.Warnf that includes the host identifier, the deduplicated mismatchedNames and the ncNameList; ensure you only log when mismatchedNames is non-empty to preserve the existing early-return behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@pkg/asset/agent/agentconfig/agenthosts.go`:
- Around line 182-217: In warnInterfaceNamesNotInNetworkConfig, instead of
calling logrus.Warnf for each mismatched iface, collect mismatched interface
names (e.g. into a slice like mismatchedNames) while iterating host.Interfaces
(skip empty names and use ncNames to check membership), and after the loop emit
a single logrus.Warnf that includes the host identifier, the deduplicated
mismatchedNames and the ncNameList; ensure you only log when mismatchedNames is
non-empty to preserve the existing early-return behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: b97e3420-d29f-4dd5-85c6-5cc36f9f770d
📒 Files selected for processing (2)
pkg/asset/agent/agentconfig/agenthosts.gopkg/asset/agent/agentconfig/agenthosts_test.go
|
/ok-to-test |
| if len(a.Hosts) > 0 { | ||
| // nodes-config.yaml hosts are user-authored like agent-config.yaml hosts, | ||
| // unlike install-config hosts which have code-generated interface names. | ||
| a.hostsFromAgentConfig = true |
There was a problem hiding this comment.
This seems a bit clunky. Could we not just do the validation here directly?
There was a problem hiding this comment.
Done — good call.
Removed the hostsFromAgentConfig field entirely. The warning is now called directly in Generate() at the two branches where we already know the hosts are user-authored (agent-config install path + AddNodes path). The install-config fallback (baremetal IPI with synthetic "boot" name) never reaches it, so no guard is needed.
Net: −14 lines, +8 lines, one fewer struct field, same behavior.
Address reviewer feedback: instead of a hostsFromAgentConfig bool flag checked inside the generic validation loop, call the warning directly in Generate() at the two branches where user-authored hosts are already known (agent-config and AddNodes paths). The install-config fallback path (baremetal IPI with synthetic "boot" interface name) never reaches the warning call, so no guard is needed. Co-authored-by: Cursor <cursoragent@cursor.com>
|
/retest-required |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
pkg/asset/agent/agentconfig/agenthosts.go (1)
206-208: ⚡ Quick winInclude hostname and make warning prefix more generic.
The warning message could be more helpful:
- It doesn't include
host.Hostname, making it hard to identify which host has the issue when multiple hosts are configured.- The
"agent-config:"prefix is misleading when the warning is triggered from the AddNodes workflow (line 97).♻️ Suggested improvement
if !ncNames[iface.Name] { - logrus.Warnf("agent-config: interface name %q not found in networkConfig interfaces %v; "+ + logrus.Warnf("host %q: interface name %q not found in networkConfig interfaces %v; "+ "connectivity may fail if interface names do not match at boot time", - iface.Name, ncNameList) + host.Hostname, iface.Name, ncNameList) }This makes it clear which host has the issue and works for both agent-config and add-nodes-config sources.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/asset/agent/agentconfig/agenthosts.go` around lines 206 - 208, Update the warning log in agenthosts.go to include the host's hostname and use a generic prefix instead of "agent-config"; when emitting the message that currently references iface.Name and ncNameList, also include host.Hostname and replace "agent-config:" with a neutral prefix like "host-config:" or "host:" so the warning applies correctly for both the agent-config and AddNodes workflows (refer to the log call that uses logrus.Warnf with iface.Name, ncNameList and the host variable).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@pkg/asset/agent/agentconfig/agenthosts.go`:
- Around line 206-208: Update the warning log in agenthosts.go to include the
host's hostname and use a generic prefix instead of "agent-config"; when
emitting the message that currently references iface.Name and ncNameList, also
include host.Hostname and replace "agent-config:" with a neutral prefix like
"host-config:" or "host:" so the warning applies correctly for both the
agent-config and AddNodes workflows (refer to the log call that uses
logrus.Warnf with iface.Name, ncNameList and the host variable).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 45ba9e69-308f-4ea0-94c2-c0539f6f498e
📒 Files selected for processing (1)
pkg/asset/agent/agentconfig/agenthosts.go
|
@chdeshpa-hue: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
validateInterfaceNamesMatchNetworkConfig()tovalidateAgentHosts()inpkg/asset/agent/agentconfig/agenthosts.gothat cross-validateshosts[].interfaces[].namevalues exist inhosts[].networkConfiginterfacesinterfaces[]names (enp3s1) did not matchnetworkConfignames (eth0) — these were silently inconsistent before the new validationProblem
openshift-install agent create imageaccepts agent-config.yaml wherehosts[].interfaces[]names don't matchhosts[].networkConfiginterface names. At boot, thepre-network-manager-config.shscript usesinterfaces[]names to find and rename.nmconnectionfiles generated fromnetworkConfig. When names mismatch:sedreplacements find zero matches (the script says "updated" but replaces nothing)Only the
agent-config.yamlpath is affected. Theinstall-config.yamlpath derives interface names FROMnetworkConfigingetInstallConfigDefaults(), so names always match by construction.Test plan
interface-name-mismatch-with-networkconfig— single ethernet, name mismatch rejectedinterface-name-matches-networkconfig— single ethernet, matching names passbond-networkconfig-with-matching-interfaces— bond with 2 slaves, matching names passbond-networkconfig-with-mismatched-interfaces— bond with 2 slaves, mismatch rejectedMade with Cursor
Summary by CodeRabbit
New Features
Tests