Add optional restart-only predicate to inplace upgrade flow#145
Open
rajathagasthya wants to merge 1 commit into
Open
Add optional restart-only predicate to inplace upgrade flow#145rajathagasthya wants to merge 1 commit into
rajathagasthya wants to merge 1 commit into
Conversation
5 tasks
328f701 to
dfc93f8
Compare
Contributor
Author
|
/ok-to-test dfc93f8 |
Consumers occasionally need a DaemonSet's pods rolled to a new revision without the disruptive pod-eviction and drain sequence -- for example when a pod-template change does not affect the managed software (a label-only change) yet still changes the controller revision hash, so the node is otherwise driven through the full upgrade flow. Add an optional RestartOnlyPredicate hook, registered via WithRestartOnlyPredicate. In the inplace ProcessUpgradeRequiredNodes, when the predicate reports that a full upgrade is not needed, cordon the node and route it straight to pod-restart-required, skipping wait-for-jobs, pod-deletion, and drain. The pod is still restarted so the DaemonSet converges to the new revision. Cordoning keeps the node unschedulable if the restart fails, matching the full flow; the uncordon-required state uncordons it on success. The change is additive and backward compatible: a nil predicate (the default, and every existing consumer) preserves current behavior, and podInSyncWithDS is unchanged. If the predicate returns an error or the cordon fails, the node is kept in upgrade-required and retried on a later reconcile, with a Warning event recorded -- an upgrade is never started on an unknown answer. The predicate is not consulted for orphaned pods, nodes with an explicit upgrade-requested annotation, or nodes waiting for safe driver load -- the safe-load handshake relies on the full flow to evict workloads before the driver load is unblocked at pod-restart-required. The maxParallelUpgrades throttle applies to both paths. The upgrade-requested state is captured before the annotation is cleared, because the node provider re-fetches and overwrites the in-memory node object. Signed-off-by: Rajath Agasthya <ragasthya@nvidia.com>
dfc93f8 to
80df349
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Consumers occasionally need a DaemonSet's pods rolled to a new revision without the disruptive pod-eviction and drain sequence — for example when a pod-template change does not affect the driver (a label-only change) yet still changes the controller revision hash, so the node is otherwise driven through the full upgrade flow.
Add an optional
RestartOnlyPredicatehook, registered viaWithRestartOnlyPredicate. In the inplaceProcessUpgradeRequiredNodes, when the predicate reports that a full upgrade is not needed, cordon the node and route it straight topod-restart-required, skipping wait-for-jobs, pod-deletion, and drain. The pod is still restarted so the DaemonSet converges to the new revision. Cordoning keeps the node unschedulable if the restart fails, matching the full flow; the uncordon-required state uncordons it on success.Upgrade state transitions
Today, an out-of-sync node always takes the full flow:
With this change, when a registered predicate returns true for the node:
The node never enters
cordon-required,wait-for-jobs-required,pod-deletion-required, ordrain-required. It is still cordoned — directly in the routing step rather than via thecordon-requiredstate — so it stays unschedulable untiluncordon-required, exactly as in the full flow. When no predicate is registered, or when the predicate returns false, the full flow above is unchanged.The change is additive and backward compatible: a nil predicate (the default, and every existing consumer) preserves current behavior, and
podInSyncWithDSis unchanged. If the predicate returns an error or the cordon fails, the node is kept inupgrade-requiredand retried on a later reconcile, with a Warning event recorded — an upgrade is never started on an unknown answer. The predicate is not consulted for orphaned pods, nodes with an explicit upgrade-requested annotation, or nodes waiting for safe driver load — the safe-load handshake relies on the full flow to evict workloads before the driver load is unblocked atpod-restart-required. The maxParallelUpgrades throttle applies to both paths.Example: the GPU Operator uses this to restart driver pods in place when only cosmetic pod-template metadata changed (the
helm.sh/chartlabel bumped by a patch chart release) whileDRIVER_CONFIG_DIGESTis unchanged — see NVIDIA/gpu-operator#2527 (draft).