Add optional restart-only predicate to inplace upgrade flow by rajathagasthya · Pull Request #145 · NVIDIA/k8s-operator-libs

rajathagasthya · 2026-06-11T17:13:39Z

Consumers occasionally need a DaemonSet's pods rolled to a new revision without the disruptive pod-eviction and drain sequence — for example when a pod-template change does not affect the driver (a label-only change) yet still changes the controller revision hash, so the node is otherwise driven through the full upgrade flow.

Add an optional RestartOnlyPredicate hook, registered via WithRestartOnlyPredicate. In the inplace ProcessUpgradeRequiredNodes, when the predicate reports that a full upgrade is not needed, cordon the node and route it straight to pod-restart-required, skipping wait-for-jobs, pod-deletion, and drain. The pod is still restarted so the DaemonSet converges to the new revision. Cordoning keeps the node unschedulable if the restart fails, matching the full flow; the uncordon-required state uncordons it on success.

Upgrade state transitions

Today, an out-of-sync node always takes the full flow:

upgrade-required → cordon-required → wait-for-jobs-required → pod-deletion-required
                 → drain-required → pod-restart-required → validation-required
                 → uncordon-required → upgrade-done

With this change, when a registered predicate returns true for the node:

upgrade-required → pod-restart-required → validation-required
                 → uncordon-required → upgrade-done

The node never enters cordon-required, wait-for-jobs-required, pod-deletion-required, or drain-required. It is still cordoned — directly in the routing step rather than via the cordon-required state — so it stays unschedulable until uncordon-required, exactly as in the full flow. When no predicate is registered, or when the predicate returns false, the full flow above is unchanged.

The change is additive and backward compatible: a nil predicate (the default, and every existing consumer) preserves current behavior, and podInSyncWithDS is unchanged. If the predicate returns an error or the cordon fails, the node is kept in upgrade-required and retried on a later reconcile, with a Warning event recorded — an upgrade is never started on an unknown answer. The predicate is not consulted for orphaned pods, nodes with an explicit upgrade-requested annotation, or nodes waiting for safe driver load — the safe-load handshake relies on the full flow to evict workloads before the driver load is unblocked at pod-restart-required. The maxParallelUpgrades throttle applies to both paths.

Example: the GPU Operator uses this to restart driver pods in place when only cosmetic pod-template metadata changed (the helm.sh/chart label bumped by a patch chart release) while DRIVER_CONFIG_DIGEST is unchanged — see NVIDIA/gpu-operator#2527 (draft).

rajathagasthya · 2026-06-11T20:57:21Z

/ok-to-test dfc93f8

Consumers occasionally need a DaemonSet's pods rolled to a new revision without the disruptive pod-eviction and drain sequence -- for example when a pod-template change does not affect the managed software (a label-only change) yet still changes the controller revision hash, so the node is otherwise driven through the full upgrade flow. Add an optional RestartOnlyPredicate hook, registered via WithRestartOnlyPredicate. In the inplace ProcessUpgradeRequiredNodes, when the predicate reports that a full upgrade is not needed, cordon the node and route it straight to pod-restart-required, skipping wait-for-jobs, pod-deletion, and drain. The pod is still restarted so the DaemonSet converges to the new revision. Cordoning keeps the node unschedulable if the restart fails, matching the full flow; the uncordon-required state uncordons it on success. The change is additive and backward compatible: a nil predicate (the default, and every existing consumer) preserves current behavior, and podInSyncWithDS is unchanged. If the predicate returns an error or the cordon fails, the node is kept in upgrade-required and retried on a later reconcile, with a Warning event recorded -- an upgrade is never started on an unknown answer. The predicate is not consulted for orphaned pods, nodes with an explicit upgrade-requested annotation, or nodes waiting for safe driver load -- the safe-load handshake relies on the full flow to evict workloads before the driver load is unblocked at pod-restart-required. The maxParallelUpgrades throttle applies to both paths. The upgrade-requested state is captured before the annotation is cleared, because the node provider re-fetches and overwrites the in-memory node object. Signed-off-by: Rajath Agasthya <ragasthya@nvidia.com>

rajathagasthya mentioned this pull request Jun 11, 2026

Restart driver pods in place when driver config is unchanged NVIDIA/gpu-operator#2527

Draft

5 tasks

rajathagasthya force-pushed the restart-only-predicate branch 2 times, most recently from 328f701 to dfc93f8 Compare June 11, 2026 20:04

rajathagasthya marked this pull request as ready for review June 11, 2026 20:57

rajathagasthya force-pushed the restart-only-predicate branch from dfc93f8 to 80df349 Compare June 16, 2026 20:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add optional restart-only predicate to inplace upgrade flow#145

Add optional restart-only predicate to inplace upgrade flow#145
rajathagasthya wants to merge 1 commit into
NVIDIA:mainfrom
rajathagasthya:restart-only-predicate

rajathagasthya commented Jun 11, 2026 •

edited

Loading

Uh oh!

rajathagasthya commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rajathagasthya commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Upgrade state transitions

Uh oh!

rajathagasthya commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rajathagasthya commented Jun 11, 2026 •

edited

Loading