Skip to content

Latest commit

 

History

History
357 lines (286 loc) · 10.4 KB

File metadata and controls

357 lines (286 loc) · 10.4 KB

SriovNetworkNodePolicy API Reference

The SriovNetworkNodePolicy CRD is the key component of the SR-IOV network operator. This custom resource instructs the operator to:

  1. Render the spec of SriovNetworkNodeState CR for selected nodes to configure SR-IOV interfaces
  2. Deploy SR-IOV CNI plugin and device plugin on selected nodes
  3. Generate the configuration of SR-IOV device plugin

NOTE: In virtual deployments, the VF interface is read-only and some fields have different behavior.

Basic SriovNetworkNodePolicy Example

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: policy-1
  namespace: sriov-network-operator
spec:
  deviceType: vfio-pci
  mtu: 1500
  nicSelector:
    deviceID: "1583"
    rootDevices:
    - 0000:86:00.0
    vendor: "8086"
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  numVfs: 4
  priority: 90
  resourceName: intelnics

This example configures Intel XL710 NICs (vendor 8086, device 1583) on nodes labeled with network-sriov.capable=true, creating 4 VFs each with vfio-pci driver and MTU 1500.

SriovNetworkNodePolicy Spec Fields

Required Fields

Field Type Description
nodeSelector map[string]string Kubernetes node selector to target specific nodes
resourceName string Name for the device plugin resource pool

NIC Selection

Field Type Description
nicSelector.vendor string PCI vendor ID (e.g., "8086" for Intel)
nicSelector.deviceID string PCI device ID
nicSelector.pfNames []string Physical function names or alternative interface names (e.g., ["eno1", "sriov1", "eno1#0-3"])
nicSelector.rootDevices []string PCI addresses (e.g., ["0000:86:00.0"])
nicSelector.netFilter string Network interface name filter

VF Configuration

Field Type Description Virtual Deployment Notes
numVfs integer Number of Virtual Functions to create No effect (always 1 VF)
deviceType string Driver to bind VFs ("netdevice", "vfio-pci") Depends on underlying device capabilities
mtu integer MTU size for VFs Cannot be changed (set by platform)

Advanced Configuration

Field Type Description
priority integer Policy priority (0 is highest) for conflict resolution
isRdma boolean Enable RDMA capabilities
needVhostNet boolean Enable vhost-net for virtualized workloads
eSwitchMode string Set eSwitch mode ("legacy", "switchdev")
externallyManaged boolean Skip VF creation (user manages VFs)

Link Configuration

Field Type Description
linkType string Link type ("eth", "ETH", "ib", "IB")
spoofChk string Spoof checking ("on", "off")
trust string VF trust mode ("on", "off")
linkState string VF link state ("auto", "enable", "disable")
maxTxRate integer Maximum transmit rate (Mbps)
minTxRate integer Minimum transmit rate (Mbps)

Alternative Interface Names

The operator discovers alternative interface names automatically and stores them in SriovNetworkNodeState.status.interfaces[].altNames.

Example SriovNetworkNodeState snippet:

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodeState
status:
  interfaces:
    - name: "ens803f1"
      altNames:
        - "eth0"
        - "net1"
        - "sriov1"

You can select a PF by using either:

  • the primary interface name (for example, ens803f1)
  • an alternative interface name (for example, sriov1, eth0, net1)
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: policy-using-altname
  namespace: sriov-network-operator
spec:
  deviceType: netdevice
  nicSelector:
    pfNames: ["sriov1"]
    vendor: "8086"
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  numVfs: 4
  priority: 99
  resourceName: intelnics

Alternative names also work with VF range notation:

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: policy-with-altname-vf-range
  namespace: sriov-network-operator
spec:
  deviceType: netdevice
  nicSelector:
    pfNames: ["eth0#0-3"]
    vendor: "8086"
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  numVfs: 8
  priority: 99
  resourceName: intelnics

To discover available alternative names on a node, see the SriovNetworkNodeState API Reference.

Virtual Deployment Considerations

In virtual environments (VMs):

  • MTU: Set by the underlying virtualization platform, cannot be changed
  • numVfs: Has no effect as there is always 1 VF per policy
  • deviceType: Depends on whether the device supports native-bifurcating drivers:
    • Mellanox devices: Use netdevice (default) for native-bifurcating support
    • Intel devices: Use vfio-pci for non-bifurcating devices
# Example for virtual deployment with Intel NIC
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: vm-policy
spec:
  deviceType: vfio-pci  # Required for Intel in VMs
  nicSelector:
    rootDevices: ["0000:00:05.0"]  # VF PCI address
  nodeSelector:
    kubernetes.io/hostname: "vm-worker-1"
  numVfs: 1  # Ignored in VMs
  resourceName: intel-vf

Multiple Policies and Priority

When multiple SriovNetworkNodePolicy CRs target the same Physical Function, the priority field (0 is highest priority) resolves conflicts.

Policy Processing Order

  1. Priority (lowest number first)
  2. Name (alphabetical order)

Policy Merging Rules

  • Policies with same priority are merged if they don't overlap
  • Policies with non-overlapping VF groups (using #-notation) are merged
  • Overlapping policies: Only the highest priority policy applies
  • Same priority + overlapping: Last processed policy wins

VF Group Notation

Use # notation to specify VF ranges:

spec:
  nicSelector:
    pfNames: ["eno1#0-3"]  # VFs 0, 1, 2, 3
  numVfs: 8
  resourceName: group1
---
spec:
  nicSelector:
    pfNames: ["eno1#4-7"]  # VFs 4, 5, 6, 7  
  numVfs: 8
  resourceName: group2

Externally Managed Virtual Functions

Set externallyManaged: true when you want to create VFs outside the operator:

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: external-vfs
spec:
  externallyManaged: true
  deviceType: vfio-pci
  nicSelector:
    pfNames: ["eno1"]
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  numVfs: 4
  resourceName: external-intelnics

Externally Managed Behavior

  • Operator skips: VF creation/deletion
  • Operator handles: Driver binding and device plugin configuration
  • User responsibility: Create VFs before applying policy
  • Policy removal: VFs are NOT removed

Use Cases

  • VFs needed for host networking (storage, management)
  • VFs must exist at boot time
  • Integration with other VF management tools

Creating VFs Externally

Example using systemd service:

# /etc/systemd/system/create-sriov-vfs.service
[Unit]
Description=Create SR-IOV VFs
Before=kubelet.service

[Service]
Type=oneshot
ExecStart=/bin/bash -c 'echo 4 > /sys/class/net/eno1/device/sriov_numvfs'
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target

RDMA Configuration

For RDMA workloads, set isRdma: true and ensure proper RDMA mode configuration:

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: rdma-policy
spec:
  deviceType: netdevice
  isRdma: true
  nicSelector:
    pfNames: ["eno1"]
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  numVfs: 4
  priority: 90
  resourceName: rdma_exclusive_device

See RDMA Configuration Guide for complete setup.

Switchdev Mode

For OVS hardware offload, configure NICs in switchdev mode:

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: switchdev-policy
spec:
  deviceType: netdevice
  eSwitchMode: switchdev
  nicSelector:
    pfNames: ["eno1"]
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  numVfs: 4
  resourceName: switchdev-nics

Troubleshooting

Check Policy Status

kubectl get sriovnetworknodepolicy -n sriov-network-operator
kubectl describe sriovnetworknodepolicy <policy-name> -n sriov-network-operator

Verify Node State

kubectl get sriovnetworknodestate -n sriov-network-operator
kubectl describe sriovnetworknodestate <node-name> -n sriov-network-operator

Common Issues

  1. Webhook Validation Failures

    • VF range exceeds maxVfs capability
    • Invalid PCI addresses or device IDs
    • Missing required fields
  2. Policy Conflicts

    • Multiple policies targeting same PF with different configs
    • Check priority values and VF group overlaps
  3. Virtual Deployment Issues

    • Wrong deviceType for VM environment
    • Attempting to change read-only properties (MTU, numVfs)
  4. External VF Management

    • VFs not created before policy application
    • Incorrect numVfs value vs actual VFs created

Policy Validation

The operator includes admission webhooks that validate policies:

# Check webhook logs
kubectl logs deployment/sriov-network-operator -n sriov-network-operator
kubectl logs deployment/sriov-network-operator-webhook -n sriov-network-operator

Node-Level Troubleshooting

For issues with specific nodes, check the config daemon and device plugin logs:

# Check config daemon logs on specific node
kubectl logs daemonset/sriov-config-daemon -n sriov-network-operator --field-selector spec.nodeName=<node-name>

# Check device plugin logs on specific node  
kubectl logs daemonset/sriov-device-plugin -n sriov-network-operator --field-selector spec.nodeName=<node-name>

# Alternative: Get pod name first, then check logs
kubectl get pods -n sriov-network-operator -l app=sriov-config-daemon --field-selector spec.nodeName=<node-name>
kubectl logs <config-daemon-pod-name> -n sriov-network-operator

kubectl get pods -n sriov-network-operator -l app=sriov-device-plugin --field-selector spec.nodeName=<node-name>
kubectl logs <device-plugin-pod-name> -n sriov-network-operator