Merge pull request #102322 from mburke5678/autoscale-vpa-in-place-updates

dfitzmau · web-flow · commit e5764b4cc00a · 2025-12-02T10:57:28.000Z
OSDOCS 17247 Document InPlaceOrRecreate feature in Vertical Pod Autoscaler
diff --git a/modules/nodes-pods-vertical-autoscaler-about.adoc b/modules/nodes-pods-vertical-autoscaler-about.adoc
@@ -14,7 +14,7 @@ Recommender::
 The VPA recommender monitors the current and past resource consumption. Based on this data, the VPA recommender determines the optimal CPU and memory resources for the pods in the associated workload object.
 
 Updater::
-The VPA updater checks if the pods in the associated workload object have the correct resources. If the resources are correct, the updater takes no action. If the resources are not correct, the updater kills the pod so that pods' controllers can re-create them with the updated requests.
+The VPA updater checks if the pods in the associated workload object have the correct resources. If the resources are correct, the updater takes no action. If the resources are not correct, the updater ensures that the pods have the updated requests. The way in which the VPA updates the resources depends on the update mode, as described in _About the Vertical Pod Autoscaler Operator modes_ later in this section.
 
 Admission controller::
 The VPA admission controller sets the correct resource requests on each new pod in the associated workload object. This applies whether the pod is new or the controller re-created the pod due to the VPA updater actions.
@@ -23,14 +23,14 @@ You can use the default recommender or use your own alternative recommender to a
 
 The default recommender automatically computes historic and current CPU and memory usage for the containers in those pods. The default recommender uses this data to determine optimized resource limits and requests to ensure that these pods are operating efficiently at all times. For example, the default recommender suggests reduced resources for pods that are requesting more resources than they are using and increased resources for pods that are not requesting enough.
 
-The VPA then automatically deletes any pods that are out of alignment with these recommendations one at a time, so that your applications can continue to serve requests with no downtime. The workload objects then redeploy the pods with the original resource limits and requests. The VPA uses a mutating admission webhook to update the pods with optimized resource limits and requests before admitting the pods to a node. If you do not want the VPA to delete pods, you can view the VPA resource limits and requests and manually update the pods as needed.
+Depending upon the VPA mode, the VPA can automatically update any pods that are out of alignment with these recommendations one at a time, so that your applications can continue to serve requests with no downtime. The VPA updates the pods with optimized resource limits and requests before admitting the pods to a node. If you do not want the VPA to update pods, you can view the VPA resource limits and requests and manually update the pods as needed.
 
 [NOTE]
 ====
 By default, workload objects must specify a minimum of two replicas for the VPA to automatically delete their pods. Workload objects that specify fewer replicas than this minimum are not deleted. If you manually delete these pods, when the workload object redeploys the pods, the VPA updates the new pods with its recommendations. You can change this minimum by modifying the `VerticalPodAutoscalerController` object as shown in _Changing the VPA minimum value_.
 ====
 
-For example, if you have a pod that uses 50% of the CPU but only requests 10%, the VPA determines that the pod is consuming more CPU than requested and deletes the pod. The workload object, such as replica set, restarts the pods and the VPA updates the new pod with its recommended resources.
+For example, if you have a pod that uses 50% of the CPU but only requests 10%, the VPA determines that the pod is consuming more CPU than requested. Depending upon the VPA mode, the VPA updates the pod or you can manually update the pod at a convenient time.
 
 For developers, you can use the VPA to help ensure that your pods active during periods of high demand by scheduling pods onto nodes that have appropriate resources for each pod.
 
@@ -40,3 +40,23 @@ Administrators can use the VPA to better use cluster resources, such as preventi
 ====
 If you stop running the VPA or delete a specific VPA CR in your cluster, the resource requests for the pods already modified by the VPA do not change. However, any new pods get the resources defined in the workload object, not the previous recommendations made by the VPA.
 ====
+
+[id="nodes-pods-vertical-autoscaler-modes_{context}"]
+== About the Vertical Pod Autoscaler Operator modes
+
+You can set on of the following VPA modes by using a `VerticalPodAutoscaler` CR: 
+
+* `InPlaceOrRecreate`. In this mode, the VPA automatically applies the recommended CPU and memory resources throughout the pod lifetime. When any pod in the project is out of alignment with the VPA recommendations, the VPA attempts to apply updates _in-place_, without restarting the pod. If the VPA is not able to update the containers in-place, the VPA deletes the pod. When redeployed by the workload object, the VPA uses a mutating admission webhook to update the new pod with its recommendations. For specific information about in-place updates, see "About the in-place-or-recreate mode". 
++
+The container's resize policy dictates the way in which `InPlaceOrRecreate` mode updates are applied. If the policy is set to `NotRequired`, the VPA attempts to update the container without restarting. If the policy is `RestartContainer`, the VPA always restarts the container upon the update.
+
+* `Recreate` and `Auto`. In either of these modes, the VPA automatically applies the recommended CPU and memory resources throughout the pod lifetime. When any pod in the project is out of alignment with the VPA recommendations, the VPA deletes the pod. When redeployed by the workload object, the VPA uses a mutating admission webhook to update the new pod with its recommendations. Use the `Recreate` mode rarely, only if you need to ensure that when the resource request changes the pods restart.
++
+--
+:FeatureName: The `Auto` VPA mode
+include::snippets/deprecated-feature.adoc[leveloffset=+1]
+--
+
+* `Initial`. In this mode, the VPA applies the recommended CPU and memory resources mode only at pod creation.
+
+* `Off`. In this mode, the VPA mode only provides the recommended resource limits and requests. You can then manually apply the recommendations. The `Off` mode does not update pods.
diff --git a/modules/nodes-pods-vertical-autoscaler-configuring.adoc b/modules/nodes-pods-vertical-autoscaler-configuring.adoc
@@ -36,7 +36,7 @@ spec:
     kind:       Deployment <1>
     name:       frontend <2>
   updatePolicy:
-    updateMode: "Auto" <3>
+    updateMode: "InPlaceOrRecreate" <3>
   resourcePolicy: <4>
     containerPolicies:
     - containerName: my-opt-sidecar
@@ -47,8 +47,8 @@ spec:
 <1> Specify the type of workload object you want this VPA to manage: `Deployment`, `StatefulSet`, `Job`, `DaemonSet`, `ReplicaSet`, or `ReplicationController`.
 <2> Specify the name of an existing workload object you want this VPA to manage.
 <3> Specify the VPA mode:
-* `Auto` to automatically apply the recommended resources on pods associated with the controller. The VPA terminates existing pods and creates new pods with the recommended resource limits and requests.
-* `Recreate` to automatically apply the recommended resources on pods associated with the workload object. The VPA terminates existing pods and creates new pods with the recommended resource limits and requests. Use the `Recreate` mode rarely, only if you need to ensure that the pods restart whenever the resource request changes.
+* `InPlaceOrRecreate` to automatically apply the recommended resources on pods associated with the workload object. The VPA attempts to update the workload object with the new resources without re-creating the pod. If the VPA is unable to update the object in place, the VPA re-creates it.
+* `Recreate` to automatically apply the recommended resources on pods associated with the workload object. The VPA terminates existing pods and creates new pods with the recommended resource limits and requests. Use the `Recreate` mode only if you need to ensure that the pods restart whenever the resource request changes.
 * `Initial` to automatically apply the recommended resources to newly-created pods associated with the workload object. The VPA does not update the pods as it learns new resource recommendations.
 * `Off` to only generate resource recommendations for the pods associated with the workload object. The VPA does not update the pods as it learns new resource recommendations and does not apply the recommendations to new pods.
 <4> Optional. Specify the containers you want to opt-out and set the mode to `Off`.
@@ -75,11 +75,13 @@ The output shows the recommendations for CPU and memory requests, similar to the
 .Example output
 [source,yaml]
 ----
-...
+apiVersion: autoscaling.k8s.io/v1
+kind: VerticalPodAutoscaler
+metadata:
+  name: vpa-recommender
+# ...
 status:
-
-...
-
+# ...
   recommendation:
     containerRecommendations:
     - containerName: frontend
diff --git a/modules/nodes-pods-vertical-autoscaler-custom-resource.adoc b/modules/nodes-pods-vertical-autoscaler-custom-resource.adoc
@@ -89,5 +89,5 @@ spec:
     kind: ScalablePod
     name: scalable-cr
   updatePolicy:
-    updateMode: "Auto"
+    updateMode: "InPlaceOrRecreate"
 ----
diff --git a/modules/nodes-pods-vertical-autoscaler-in-place.adoc b/modules/nodes-pods-vertical-autoscaler-in-place.adoc
@@ -0,0 +1,25 @@
+// Module included in the following assemblies:
+//
+// * nodes/nodes-vertical-autoscaler.adoc
+
+:_mod-docs-content-type: CONCEPT
+[id="nodes-pods-vertical-autoscaler-in-place_{context}"]
+= About the in-place-or-recreate mode
+
+The Vertical Pod Autoscaler Operator (VPA) contains an optional `InPlaceOrRecreate` mode. This mode instructs the VPA to attempt to perform an _in-place pod resize_ when updating CPU or memory resources, where the VPA first attempts to update the resources in a running pod without recreating the pod. If an in-place resize is not possible, the VPA falls back to the `Recreate` mode and restarts the pod in order to update the resources.
+
+When in `InPlaceOrRecreate` mode, the VPA falls back to always recreating pods in the following scenarios:
+
+* The in-place update is considered _infeasible_ for reasons including the following reasons:
+** The requested resources exceed the node's total capacity.
+** The pod is a static pod.
+** The containers has swap enabled.
+** The pod is managed by a static `cpuManagerPolicy` or `memoryManagerPolicy` parameter.
+* The update is _deferred_ for more than 5 minutes. A deferred update is one that is currently not possible, but might become possible at a later time. For example, if another pod is removed from the node, the requested resources might become available. The kubelet retries the resize when conditions on the node change.
+* The update is in progress for more than 1 hour.
+* The pod QoS class would change due to the update.
+* The update would downscale the memory limit.
+
+If a container in a pod has a `RestartContainer` container resize policy, which requires a restart upon a resource update, a VPA in `InPlaceOrRecreate` mode honors the resize policy and restarts the container.
+
+For more information on in-place updates, see "Adjust pod resource levels without pod disruption".
diff --git a/modules/nodes-pods-vertical-autoscaler-using-about.adoc b/modules/nodes-pods-vertical-autoscaler-using-about.adoc
@@ -8,13 +8,7 @@
 
 To use the Vertical Pod Autoscaler Operator (VPA), you create a VPA custom resource (CR) for a workload object in your cluster. The VPA learns and applies the optimal CPU and memory resources for the pods associated with that workload object. You can use a VPA with a deployment, stateful set, job, daemon set, replica set, or replication controller workload object. The VPA CR must be in the same project as the pods that you want to check.
 
-You use the VPA CR to associate a workload object and specify the mode that the VPA operates in:
-
-* The `Auto` and `Recreate` modes automatically apply the VPA CPU and memory recommendations throughout the pod lifetime. The VPA deletes any pods in the project that are out of alignment with its recommendations. When redeployed by the workload object, the VPA updates the new pods with its recommendations.
-* The `Initial` mode automatically applies VPA recommendations only at pod creation.
-* The `Off` mode only provides recommended resource limits and requests. You can then manually apply the recommendations. The `Off` mode does not update pods.
-
-You can also use the CR to opt-out certain containers from VPA evaluation and updates.
+You use the VPA CR to associate a workload object and specify the mode that the VPA operates in. You can also use the CR to opt-out certain containers from VPA evaluation and updates.
 
 For example, a pod has the following limits and requests:
 
@@ -29,7 +23,7 @@ resources:
     memory: 100Mi
 ----
 
-After creating a VPA that is set to `Auto`, the VPA learns the resource usage and deletes the pod. When redeployed, the pod uses the new resource limits and requests:
+After creating a VPA that is set to `Recreate`, the VPA learns the resource usage and deletes the pod. When redeployed, the pod uses the new resource limits and requests:
 
 [source,yaml]
 ----
@@ -92,6 +86,11 @@ The output shows the recommended resources, `target`, the minimum recommended re
 
 The VPA uses the `lowerBound` and `upperBound` values to determine if a pod needs updating. If a pod has resource requests less than the `lowerBound` values or more than the `upperBound` values, the VPA terminates and recreates the pod with the `target` values.
 
+[role="_additional-resources"]
+.Additional resources
+
+* xref:../../nodes/pods/nodes-pods-adjust-resources-in-place.adoc#nodes-pods-adjust-resources-in-place-about_nodes-pods-adjust-resources-in-place[Adjust pod resource levels without pod disruption]
+
 [id="nodes-pods-vertical-autoscaler-using-one-pod_{context}"]
 == Changing the VPA minimum value
 
@@ -128,16 +127,16 @@ spec:
 
 [id="nodes-pods-vertical-autoscaler-using-auto_{context}"]
 == Automatically applying VPA recommendations
-To use the VPA to automatically update pods, create a VPA CR for a specific workload object with `updateMode` set to `Auto` or `Recreate`.
+To use the VPA to automatically update pods, create a VPA CR for a specific workload object with `updateMode` set to `InPlaceOrRecreate` or `Recreate`.
 
-When the pods are created for the workload object, the VPA constantly monitors the containers to analyze their CPU and memory needs. The VPA deletes any pods that do not meet the VPA recommendations for CPU and memory. When redeployed, the pods use the new resource limits and requests based on the VPA recommendations, honoring any pod disruption budget set for your applications. The recommendations are added to the `status` field of the VPA CR for reference.
+When the pods are created for the workload object, the VPA constantly monitors the containers to analyze their CPU and memory needs. The VPA updates any pods that do not meet the VPA recommendations for CPU and memory to use the new resource limits and requests based on the VPA recommendations, honoring any pod disruption budget set for your applications. The recommendations are added to the `status` field of the VPA CR for reference.
 
 [NOTE]
 ====
 By default, workload objects must specify a minimum of two replicas in order for the VPA to automatically delete their pods. Workload objects that specify fewer replicas than this minimum are not deleted. If you manually delete these pods, when the workload object redeploys the pods, the VPA does update the new pods with its recommendations. You can change this minimum by modifying the `VerticalPodAutoscalerController` object as shown in _Changing the VPA minimum value_.
 ====
 
-.Example VPA CR for the `Auto` mode
+.Example VPA CR for the `InPlaceOrRecreate` or `Recreate` mode
 [source,yaml]
 ----
 apiVersion: autoscaling.k8s.io/v1
@@ -150,13 +149,14 @@ spec:
     kind:       Deployment <1>
     name:       frontend <2>
   updatePolicy:
-    updateMode: "Auto" <3>
+    updateMode: "InPlaceOrRecreate" <3>
 ----
 <1> The type of workload object you want this VPA CR to manage.
 <2> The name of the workload object you want this VPA CR to manage.
-<3> Set the mode to `Auto` or `Recreate`:
-* `Auto`. The VPA assigns resource requests on pod creation and updates the existing pods by terminating them when the requested resources differ significantly from the new recommendation.
-* `Recreate`. The VPA assigns resource requests on pod creation and updates the existing pods by terminating them when the requested resources differ significantly from the new recommendation. Use this mode rarely, only if you need to ensure that when the resource request changes the pods restart.
+<3> Set the mode to `InPlaceOrRecreate` or `Recreate`: 
++
+* `InPlaceOrRecreate`. The VPA attempts to update the workload object with the new resource requests without re-creating the pod. If the VPA is unable to update the object in place, the VPA re-creates it.
+* `Recreate`. The VPA assigns resource requests on pod creation and updates the existing pods by terminating them. Use this mode rarely, only if you need to ensure that when the resource request changes the pods restart.
 
 [NOTE]
 ====
@@ -258,15 +258,15 @@ spec:
     kind:       Deployment <1>
     name:       frontend <2>
   updatePolicy:
-    updateMode: "Auto" <3>
+    updateMode: "InPlaceOrRecreate" <3>
   resourcePolicy: <4>
     containerPolicies:
     - containerName: my-opt-sidecar
       mode: "Off"
 ----
 <1> The type of workload object you want this VPA CR to manage.
 <2> The name of the workload object you want this VPA CR to manage.
-<3> Set the mode to `Auto`, `Recreate`, `Initial`, or `Off`. Use the `Recreate` mode rarely, only if you need to ensure that when the resource request changes the pods restart.
+<3> Set the mode to `InPlaceOrRecreate`, `Recreate`, `Initial`, or `Off`. Use the `Recreate` mode rarely. For example, use this mode to ensure that the pods restart when the resources are updated.
 <4> Specify the containers that you do not want updated by the VPA and set the `mode` to `Off`.
 
 For example, a pod has two containers, the same resource requests and limits:
diff --git a/nodes/pods/nodes-pods-vertical-autoscaler.adoc b/nodes/pods/nodes-pods-vertical-autoscaler.adoc
@@ -30,6 +30,13 @@ The VPA helps you to understand the optimal CPU and memory usage for your pods a
 
 include::modules/nodes-pods-vertical-autoscaler-about.adoc[leveloffset=+1]
 
+include::modules/nodes-pods-vertical-autoscaler-in-place.adoc[leveloffset=+2]
+
+[role="_additional-resources"]
+.Additional resources
+* xref:../../nodes/pods/nodes-pods-vertical-autoscaler.adoc#nodes-pods-vertical-autoscaler-using-about_nodes-pods-vertical-autoscaler[About using the Vertical Pod Autoscaler Operator]
+* xref:../../nodes/pods/nodes-pods-adjust-resources-in-place.adoc#nodes-pods-adjust-resources-in-place[Adjust pod resource levels without pod disruption]
+
 include::modules/nodes-pods-vertical-autoscaler-install.adoc[leveloffset=+1]
 
 include::modules/nodes-pods-vertical-autoscaler-moving-vpa.adoc[leveloffset=+1]