Merge pull request #103581 from mburke5678/autoscale-cordon-terminate

mburke5678 · web-flow · commit ff35be68f3d1 · 2025-12-17T13:28:18.000-05:00
OSDOCS 15817 Support cordon-node-before-terminating in cluster-autoscaler
diff --git a/modules/cluster-autoscaler-about.adoc b/modules/cluster-autoscaler-about.adoc
@@ -39,7 +39,7 @@ endif::openshift-rosa-hcp[]
 ifndef::openshift-rosa-hcp[]
 [IMPORTANT]
 ====
-Ensure that the `maxNodesTotal` value in the `ClusterAutoscaler` resource definition that you create is large enough to account for the total possible number of machines in your cluster. This value must encompass the number of control plane machines and the possible number of compute machines that you might scale to.
+Ensure that the `maxNodesTotal` value in the `ClusterAutoscaler` custom resource (CR) that you create is large enough to account for the total possible number of machines in your cluster. This value must encompass the number of control plane machines and the possible number of compute machines that you might scale to.
 ====
 endif::openshift-rosa-hcp[]
 
@@ -65,6 +65,10 @@ If the following types of pods are present on a node, the cluster autoscaler wil
 
 For example, you set the maximum CPU limit to 64 cores and configure the cluster autoscaler to only create machines that have 8 cores each. If your cluster starts with 30 cores, the cluster autoscaler can add up to 4 more nodes with 32 cores, for a total of 62.
 
+[NOTE]
+====
+By default, when the cluster autoscaler removes a node, it does not cordon the node when draining the pods from the node. You can configure the cluster autoscaler to cordon the node before draining and moving the pods by setting the `spec.scaleDown.cordonNodeBeforeTerminating` parameter to `enabled` in the `ClusterAutoscaler` CR. This parameter is disabled by default. It is recommended to enable this parameter in production clusters because of the risk of data loss, application errors, pods getting stuck in the terminating state, or other issues if the cluster autoscaler removes a node when the parameter is disabled. Leaving this parameter disabled, which can result in faster node removal, might be appropriate in clusters that run only stateless workloads.
+====
 
 [id="cluster-autoscaler-limitations_{context}"]
 == Limitations
diff --git a/modules/cluster-autoscaler-cr.adoc b/modules/cluster-autoscaler-cr.adoc
@@ -37,6 +37,7 @@ spec:
       max: 16
   logVerbosity: 4
   scaleDown:
+    cordonNodeBeforeTerminating: Enabled
     enabled: true
     delayAfterAdd: 10m
     delayAfterDelete: 5m
@@ -95,6 +96,12 @@ If you do not specify a value, the default value of `1` is used.
 |`scaleDown`
 |In this section, you can specify the period to wait for each action by using any valid link:https://golang.org/pkg/time/#ParseDuration[ParseDuration] interval, including `ns`, `us`, `ms`, `s`, `m`, and `h`.
 
+|`scaleDown.cordonNodeBeforeTerminating`
+a|Optional: Specify whether the cluster autoscaler should cordon a node before removing that node by using one of the following values:
+
+* `Enabled`: The cluster autoscaler cordons the node before draining any pods and removing that node.
+* `Disabled`: The cluster autoscaler does not cordon the node before draining any pods and removing that node. This is the default.
+
 |`scaleDown.enabled`
 |Specify whether the cluster autoscaler can remove unnecessary nodes.