Skip to content

Commit 455d290

Browse files
committed
Address more comments in AEP
1 parent e347836 commit 455d290

File tree

1 file changed

+62
-33
lines changed
  • vertical-pod-autoscaler/enhancements/7862-cpu-startup-boost

1 file changed

+62
-33
lines changed

vertical-pod-autoscaler/enhancements/7862-cpu-startup-boost/README.md

Lines changed: 62 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -31,11 +31,12 @@ running in containerized applications, especially Java workloads. This delay can
3131
negatively impact the user experience and overall application performance. One
3232
potential solution is to provide additional CPU resources to pods during their
3333
startup phase, but this can lead to waste if the extra CPU resources are not
34-
set back to their original values after the pods are ready.
34+
set back to their original values after the pods have started up.
3535

3636
This proposal allows VPA to boost the CPU request and limit of containers during
37-
the pod startup and to scale the CPU resources back down when the pod is `Ready`,
38-
leveraging the [in-place pod resize Kubernetes feature](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/1287-in-place-update-pod-resources).
37+
the pod startup and to scale the CPU resources back down when the pod is
38+
`Ready` or after certain time has elapsed, leveraging the
39+
[in-place pod resize Kubernetes feature](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/1287-in-place-update-pod-resources).
3940

4041
> [!NOTE]
4142
> This feature depends on the new `InPlaceOrRecreate` VPA mode:
@@ -44,17 +45,16 @@ leveraging the [in-place pod resize Kubernetes feature](https://github.com/kuber
4445
### Goals
4546

4647
* Allow VPA to boost the CPU request and limit of a pod's containers during the
47-
pod startup (from creation time until it becomes `Ready`).
48+
pod (re-)creation time.
4849
* Allow VPA to scale pods down [in-place](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/1287-in-place-update-pod-resources)
4950
to the existing VPA recommendation for that container, if any, or to the CPU
5051
resources configured in the pod spec, as soon as their [`Ready`](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions)
51-
condition is true.
52+
condition is true and `StartupBoost.CPU.Duration` has elapsed.
5253

5354
### Non-Goals
5455

55-
* Allow VPA to boost CPU resources of pods that are already [`Ready`](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions).
56-
* Allow VPA to boost CPU resources during startup of workloads that have not
57-
configured a [Readiness or a Startup probe](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/).
56+
* Allow VPA to boost CPU resources of pods outside of the pod (re-)creation
57+
time.
5858
* Allow VPA to boost memory resources.
5959
* This is out of scope for now because the in-place pod resize feature
6060
[does not support memory limit decrease yet.](https://github.com/kubernetes/enhancements/tree/758ea034908515a934af09d03a927b24186af04c/keps/sig-node/1287-in-place-update-pod-resources#memory-limit-decreases)
@@ -69,6 +69,10 @@ boost.
6969
with a new `StartupBoostOnly` mode to allow users to only enable the startup
7070
boost feature and not vanilla VPA altogether.
7171

72+
* To allow CPU startup boost if a `StartupBoost` config is specified in `Auto`
73+
[`ContainerScalingMode`](https://github.com/kubernetes/autoscaler/blob/vertical-pod-autoscaler-1.3.0/vertical-pod-autoscaler/pkg/apis/autoscaling.k8s.io/v1/types.go#L231-L236)
74+
container policies.
75+
7276
## Design Details
7377

7478
### Workflow
@@ -83,9 +87,10 @@ limits to align with its `StartupBoost` policy, if specified, during the pod
8387
creation.
8488

8589
1. The VPA Updater monitors pods targeted by the VPA object and when the pod
86-
condition is `Ready`, it scales down the CPU resources to the appropriate
87-
non-boosted value: `existing VPA recommendation for that container` (if any) OR
88-
the `CPU resources configured in the pod spec`.
90+
condition is `Ready` and `StartupBoost.CPU.Duration` has elapsed, it scales
91+
down the CPU resources to the appropriate non-boosted value:
92+
`existing VPA recommendation for that container` (if any) OR the
93+
`CPU resources configured in the pod spec`.
8994
* The scale down is applied [in-place](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/1287-in-place-update-pod-resources).
9095

9196
### API Changes
@@ -96,6 +101,8 @@ and contain the following fields:
96101
resource request and limit of the containers' targeted by the VPA object.
97102
* `StartupBoost.CPU.Value`: the target value of the CPU request or limit
98103
during the startup boost phase.
104+
* [Optional] `StartupBoost.CPU.Duration`: if specified, it indicates for how
105+
long to keep the pod boosted **after** it goes to `Ready`.
99106

100107
> [!IMPORTANT]
101108
> The boosted CPU value will be capped by
@@ -105,6 +112,15 @@ and contain the following fields:
105112
> [!IMPORTANT]
106113
> Only one of `Factor` or `Value` may be specified per container policy.
107114
115+
116+
> [!NOTE]
117+
> To ensure that containers are unboosted only after their applications are
118+
> started and ready, it is recommended to configure a
119+
> [Readiness or a Startup probe](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/)
120+
> for the containers that will be CPU boosted. Check the [Test Plan](#test-plan)
121+
> section for more details on this feature's behavior for different combinations
122+
> of probers + `StartupBoost.CPU.Duration`.
123+
108124
We will also add a new mode to the [`ContainerScalingMode`](https://github.com/kubernetes/autoscaler/blob/vertical-pod-autoscaler-1.3.0/vertical-pod-autoscaler/pkg/apis/autoscaling.k8s.io/v1/types.go#L231-L236):
109125
* **NEW**: `StartupBoostOnly`: new mode that will allow users to only enable
110126
the startup boost feature for a container and not vanilla VPA altogether.
@@ -146,21 +162,17 @@ are created/updated:
146162
* `StartupBoost.CPU.Value` must be greater than the CPU request or limit of the
147163
container during the boost phase, otherwise we risk downscaling the container.
148164

149-
* Workloads must be configured with a Readiness or a Startup probe to be able to
150-
utilize this feature. Therefore, VPA will not boost CPU resources of workloads
151-
that do not configure a Readiness or a Startup probe.
152-
153165
### Mitigating Failed In-Place Downsizes
154166

155-
The VPA Updater **will not** evict a pod to actuate a startup CPU boost
156-
recommendation if it attempted to apply the recommendation in place and it
157-
failed (see the [scenarios](https://github.com/kubernetes/autoscaler/blob/0a34bf5d3a71b486bdaa440f1af7f8d50dc8e391/vertical-pod-autoscaler/enhancements/4016-in-place-updates-support/README.md?plain=1#L164-L169 ) where the VPA
167+
The VPA Updater **will not** evict a pod if it attempted to scaled the pod down
168+
in place (to unboost its CPU resources) and the update failed (see the
169+
[scenarios](https://github.com/kubernetes/autoscaler/blob/0a34bf5d3a71b486bdaa440f1af7f8d50dc8e391/vertical-pod-autoscaler/enhancements/4016-in-place-updates-support/README.md?plain=1#L164-L169 ) where the VPA
158170
updater will consider that the update failed). This is to avoid an eviction
159171
loop:
160172

161173
1. A pod is created and has its CPU resources boosted
162-
1. The pod is ready. VPA Updater tries to downscale the pod in-place and it
163-
fails.
174+
1. The pod meets the conditions to be unboosted. VPA Updater tries to downscale
175+
the pod in-place and it fails.
164176
1. VPA Updater evicts the pod. Logic flow goes back to (1).
165177

166178
### Feature Enablement and Rollback
@@ -189,33 +201,41 @@ the following to happen:
189201
* admission-controller **to not** boost CPU resources, should it encounter a
190202
VPA configured with a `StartupBoost` config and `StartupBoostOnly` or `Auto`
191203
`ContainerScalingMode`.
192-
* updater **to not** unboost CPU resources when pods become `Ready`, should it
193-
encounter a VPA configured with a `StartupBoost` config and `StartupBoostOnly`
194-
or `Auto` `ContainerScalingMode`.
204+
* updater **to not** unboost CPU resources when pods meet the scale down
205+
requirements, should it encounter a VPA configured with a `StartupBoost`
206+
config and `StartupBoostOnly` or `Auto` `ContainerScalingMode`.
195207

196208
### Kubernetes Version Compatibility
197209

198210
Similarly to [AEP-4016](https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler/enhancements/4016-in-place-updates-support#kubernetes-version-compatibility),
199211
`StartupBoost` configuration and `StartupBoostOnly` mode are built assuming that
200212
VPA will be running on a Kubernetes 1.33+ with the beta version of
201213
[KEP-1287: In-Place Update of Pod Resources](https://github.com/kubernetes/enhancements/issues/1287)
202-
enabled. If this is not the case, VPA will behave as if the `CPUStartupBoost`
203-
feature gate was disabled (see [Feature Enablement and Rollback](#feature-enablement-and-rollback)
204-
section for details).
214+
enabled. If this is not the case, VPA's attempt to unboost pods may fail and the
215+
pods may remain boosted for their whole lifecycle.
205216

206217
## Test Plan
207218

208219
Other than comprehensive unit tests, we will also add the following scenarios to
209220
our e2e tests:
210221

211222
* CPU Startup Boost recommendation is applied to pod controlled by VPA until it
212-
becomes `Ready`. Then, the pod is scaled back down in-place.
223+
becomes `Ready` and `StartupBoost.CPU.Duration` has elapsed. Then, the pod is
224+
scaled back down in-place. We'll also test the following sub-cases:
213225
* Boost is applied to all containers of a pod.
214-
* Boost is applied to a subset of containers.
215-
* CPU Startup Boost will not be applied if a pod is not configured with a
216-
Readiness or a Startup probe.
217-
* Pod is evicted the first time that an in-place update fails when scaling the
218-
pod back down. And a new CPU boost is not attempted when the pod is recreated.
226+
* Boost is applied only to a subset of containers in a pod.
227+
* Combinations of probes + `StartupBoost.CPU.Duration`:
228+
* No probes and no `StartupBoost.CPU.Duration` specified: unboost will
229+
likely happen immediately.
230+
* No probes and a 60s `StartupBoost.CPU.Duration`: unboost will likely
231+
happen after 60s.
232+
* A readiness/startup probe and no `StartupBoost.CPU.Duration` specified:
233+
unboost will likely as soon as the pod becomes `Ready`.
234+
* A readiness/startup probe and a 60s `StartupBoost.CPU.Duration`
235+
specified: unboost will likely happen 60s **after** the pod becomes `Ready`.
236+
237+
* Pod is not evicted if the in-place update fails when scaling the pod back
238+
down.
219239

220240
## Examples
221241

@@ -224,6 +244,10 @@ scenarios.
224244

225245
### CPU Boost Only
226246

247+
All containers under `example` deployment will receive "regular" VPA updates,
248+
**except for** `boosted-container-name`. `boosted-container-name` will only be
249+
CPU boosted/unboosted, because it has a `StartupBoostOnly` container policy.
250+
227251
```yaml
228252
apiVersion: "autoscaling.k8s.io/v1"
229253
kind: VerticalPodAutoscaler
@@ -248,6 +272,11 @@ spec:
248272
249273
### CPU Boost and Vanilla VPA
250274
275+
All containers under `example` deployment will receive "regular" VPA updates,
276+
**including** `boosted-container-name`. Additionally, `boosted-container-name`
277+
will be CPU boosted/unboosted, because it has a `StartupBoost` config in its
278+
container policy and `Auto` container policy mode.
279+
251280
```yaml
252281
apiVersion: "autoscaling.k8s.io/v1"
253282
kind: VerticalPodAutoscaler
@@ -279,5 +308,5 @@ spec:
279308

280309
## Implementation History
281310

282-
* 2025-03-18: Initial version.
311+
* 2025-03-20: Initial version.
283312

0 commit comments

Comments
 (0)