Skip to content

fix: update values for gpu operator network operator#1025

Merged
anson627 merged 4 commits intomainfrom
fix-operator-values
Feb 5, 2026
Merged

fix: update values for gpu operator network operator#1025
anson627 merged 4 commits intomainfrom
fix-operator-values

Conversation

@anson627
Copy link
Collaborator

@anson627 anson627 commented Jan 17, 2026

This pull request refactors the GPU module to standardize the use of the mpi-operator namespace for MPI jobs and related resources, reorganizes configuration files, and updates code references accordingly. It also improves test coverage and correctness for MPI operator installation and configuration.

Namespace and Resource Management:

  • All MPIJob resources and related Kubernetes objects are now consistently deployed in the mpi-operator namespace instead of default. This includes updating YAML manifests, code references, and resource management logic. [1] [2] [3] [4] [5] [6] [7] [8]

Configuration File Organization:

  • MPI job configuration files have been moved from the cfg/mpi/ directory to cfg/mpi-operator/ to better reflect their usage and to align with the new namespace. [1] [2] [3] [4] [5]

Operator Installation and Resource Application:

  • The MPI operator installation process now uses a more specific label selector and clarifies the expected chart version format.
  • Network operator resource file references have been updated to use the network-operator directory, improving clarity and maintainability.

Testing Improvements:

  • Unit tests for MPI operator installation and configuration have been updated to mock the correct functions and verify that the new parameters and resource paths are used, ensuring correctness after refactoring. [1] [2] [3]

Documentation:

  • The README.md has been updated to remove unused configuration options for the Python script, reflecting the current set of supported arguments.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR corrects the file path locations for network operator manifest files, changing from the incorrect /net/ subdirectory to the correct /network-operator/ subdirectory. The PR also adds the corresponding configuration files that were missing.

Changes:

  • Updated file paths in install_network_operator function to reference the correct network-operator subdirectory
  • Added network operator configuration files (values.yaml, nfd-network-rule.yaml, nic-cluster-policy.yaml)
  • Added GPU operator configuration file (values.yaml)

Reviewed changes

Copilot reviewed 1 out of 5 changed files in this pull request and generated no comments.

Show a summary per file
File Description
modules/python/gpu/pkg/net.py Updated paths for NFD network rule and NIC cluster policy manifests from /net/ to /network-operator/
modules/python/gpu/cfg/network-operator/values.yaml Added Helm values configuration for network operator with NFD node feature rules disabled
modules/python/gpu/cfg/network-operator/nic-cluster-policy.yaml Added NIC cluster policy configuration for OFED driver and SR-IOV device plugin with InfiniBand support
modules/python/gpu/cfg/network-operator/nfd-network-rule.yaml Added node feature rule to label nodes with Mellanox PCI devices
modules/python/gpu/cfg/gpu-operator/values.yaml Added GPU operator Helm values with RDMA enabled and other features disabled

@anson627 anson627 force-pushed the fix-operator-values branch from 960f19b to 9f4a1a7 Compare February 3, 2026 19:51
@anson627 anson627 force-pushed the fix-operator-values branch 2 times, most recently from 6acc721 to ef49966 Compare February 5, 2026 22:41
@anson627 anson627 force-pushed the fix-operator-values branch from ef49966 to 699e45c Compare February 5, 2026 22:43
@anson627 anson627 merged commit 6d2f93f into main Feb 5, 2026
3 checks passed
@anson627 anson627 deleted the fix-operator-values branch February 5, 2026 23:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants