Skip to content

Commit c187e7f

Browse files
authored
AEP-8026: per-vpa-component-configuration (#8026)
* AEP: per-vpa-component-configuration Signed-off-by: Omer Aplatony <omerap12@gmail.com> * Set proper AEP number Signed-off-by: Omer Aplatony <omerap12@gmail.com> * update AEP Signed-off-by: Omer Aplatony <omerap12@gmail.com> * remove duplicated title Signed-off-by: Omer Aplatony <omerap12@gmail.com> * fixed camel-case Signed-off-by: Omer Aplatony <omerap12@gmail.com> * Add e2e testing Signed-off-by: Omer Aplatony <omerap12@gmail.com> * add parameter descriptions Signed-off-by: Omer Aplatony <omerap12@gmail.com> * Add feature flag Signed-off-by: Omer Aplatony <omerap12@gmail.com> * Updated on/off feature flag Signed-off-by: Omer Aplatony <omerap12@gmail.com> * evictAfterOomThreshold->evictAfterOOMThreshold Signed-off-by: Omer Aplatony <omerap12@gmail.com> * Add more info Signed-off-by: Omer Aplatony <omerap12@gmail.com> * oomBumpUpRatio to int Signed-off-by: Omer Aplatony <omerap12@gmail.com> * fixed typo Signed-off-by: Omer Aplatony <omerap12@gmail.com> * Move oomBumpUpRatio to string Signed-off-by: Omer Aplatony <omerap12@gmail.com> * Add memoryAggregationIntervalCount Signed-off-by: Omer Aplatony <omerap12@gmail.com> * Used quantity instead of string Signed-off-by: Omer Aplatony <omerap12@gmail.com> --------- Signed-off-by: Omer Aplatony <omerap12@gmail.com>
1 parent 9bc4220 commit c187e7f

File tree

1 file changed

+229
-0
lines changed
  • vertical-pod-autoscaler/enhancements/8026-per-vpa-component-configuration

1 file changed

+229
-0
lines changed
Lines changed: 229 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,229 @@
1+
# AEP-8026: Allow per-VPA component configuration parameters
2+
3+
<!-- toc -->
4+
- [Summary](#summary)
5+
- [Motivation](#motivation)
6+
- [Goals](#goals)
7+
- [Non-Goals](#non-goals)
8+
- [Proposal](#proposal)
9+
- [Parameter Descriptions](#parameter-descriptions)
10+
- [Container Policy Parameters](#container-policy-parameters)
11+
- [Update Policy Parameters](#update-policy-parameters)
12+
- [Design Details](#design-details)
13+
- [API Changes](#api-changes)
14+
- [Phase 1 (Current Proposal)](#phase-1-current-proposal)
15+
- [Future Extensions](#future-extensions)
16+
- [Feature Enablement and Rollback](#feature-enablement-and-rollback)
17+
- [How can this feature be enabled / disabled in a live cluster?](#how-can-this-feature-be-enabled--disabled-in-a-live-cluster)
18+
- [Kubernetes version compatibility](#kubernetes-version-compatibility)
19+
- [Validation via CEL and Testing](#validation-via-cel-and-testing)
20+
- [Test Plan](#test-plan)
21+
- [Implementation History](#implementation-history)
22+
- [Future Work](#future-work)
23+
- [Alternatives](#alternatives)
24+
- [Multiple VPA Deployments](#multiple-vpa-deployments)
25+
- [Environment-Specific Configuration](#environment-specific-configuration)
26+
<!-- /toc -->
27+
28+
## Summary
29+
30+
Currently, VPA components (recommender, updater, admission controller) are configured through global flags. This makes it challenging to support different workloads with varying resource optimization needs within the same cluster. This proposal introduces the ability to specify configuration parameters at the individual VPA object level, allowing for workload-specific optimization strategies.
31+
32+
## Motivation
33+
34+
Different types of workloads in a Kubernetes cluster often have different resource optimization requirements. For example:
35+
- Batch processing jobs might benefit from aggressive OOM handling and frequent adjustments
36+
- User-facing services might need more conservative growth patterns for stability
37+
- Development environments might need different settings than production
38+
39+
Currently, supporting these different needs requires running multiple VPA component instances with different configurations, which increases operational complexity and resource usage.
40+
41+
### Goals
42+
43+
- Allow specification of component-specific parameters in individual VPA objects
44+
- Support different optimization strategies for different workloads in the same cluster
45+
- Maintain backward compatibility with existing global configuration
46+
- Initially support the following parameters:
47+
- oomBumpUpRatio
48+
- oomMinBumpUp
49+
- memoryAggregationInterval
50+
- memoryAggregationIntervalCount
51+
- evictAfterOOMThreshold
52+
53+
### Non-Goals
54+
55+
- Converting all existing VPA flags to per-object configuration
56+
- Changing the core VPA algorithm or its decision-making process
57+
- Adding new optimization strategies
58+
59+
## Proposal
60+
61+
The configuration will be split into two sections: container-specific recommendations under `containerPolicies` and updater configuration under `updatePolicy`. This structure is designed to be extensible, allowing for additional parameters to be added in future iterations of this enhancement.
62+
63+
```yaml
64+
apiVersion: autoscaling.k8s.io/v1
65+
kind: VerticalPodAutoscaler
66+
metadata:
67+
name: oom-test-vpa
68+
spec:
69+
targetRef:
70+
apiVersion: apps/v1
71+
kind: Deployment
72+
name: oom-test
73+
updatePolicy:
74+
updateMode: Auto
75+
evictAfterOOMThreshold: "5m"
76+
resourcePolicy:
77+
containerPolicies:
78+
- containerName: "*"
79+
oomBumpUpRatio: "1.5"
80+
oomMinBumpUp: 104857600
81+
memoryAggregationInterval: "12h"
82+
memoryAggregationIntervalCount: 5
83+
```
84+
85+
### Parameter Descriptions
86+
87+
#### Container Policy Parameters
88+
#### Container Policy Parameters
89+
* `oomBumpUpRatio` (Quantity):
90+
- Multiplier applied to memory recommendations after OOM events
91+
- Represented as a Quantity (e.g., "1.5")
92+
- Must be greater than 1
93+
- Controls how aggressively memory is increased after container crashes
94+
95+
* `oomMinBumpUp` (bytes):
96+
- Minimum absolute memory increase after OOM events
97+
- Ensures meaningful increases even for small containers
98+
99+
* `memoryAggregationInterval` (duration):
100+
- Time window for aggregating memory usage data
101+
- Affects how quickly VPA responds to memory usage changes
102+
103+
* `memoryAggregationIntervalCount` (integer):
104+
- Number of consecutive memory aggregation intervals
105+
- Used to calculate the total memory aggregation window length
106+
- Total window length = memoryAggregationInterval * memoryAggregationIntervalCount
107+
108+
#### Update Policy Parameters
109+
* `evictAfterOOMThreshold` (duration):
110+
- Time to wait after OOM before considering pod eviction
111+
- Helps prevent rapid eviction cycles while maintaining stability
112+
113+
Each parameter can be configured independently, falling back to global defaults if not specified. Values should be chosen based on workload characteristics and stability requirements.
114+
115+
## Design Details
116+
117+
### API Changes
118+
119+
#### Phase 1 (Current Proposal)
120+
121+
Extend `ContainerResourcePolicy` with:
122+
* `oomBumpUpRatio`
123+
* `oomMinBumpUp`
124+
* `memoryAggregationInterval`
125+
* `memoryAggregationIntervalCount`
126+
127+
Extend `PodUpdatePolicy` with:
128+
* `evictAfterOOMThreshold`
129+
130+
#### Future Extensions
131+
132+
This AEP will be updated as additional parameters are identified for per-object configuration. Potential candidates include:
133+
* `confidenceIntervalCPU`
134+
* `confidenceIntervalMemory`
135+
* `recommendationMarginFraction`
136+
* Other parameters that benefit from workload-specific tuning
137+
138+
### Feature Enablement and Rollback
139+
140+
#### How can this feature be enabled / disabled in a live cluster?
141+
142+
- Feature gate name: `PerVPAConfig`
143+
- Default: Off (Alpha)
144+
- Components depending on the feature gate:
145+
- admission-controller
146+
- recommender
147+
- updater
148+
149+
The feature gate will remain in alpha (default off) until:
150+
- All planned configuration parameters have been implemented and tested
151+
- Performance impact has been thoroughly evaluated
152+
- Documentation is complete for all parameters
153+
154+
Disabling of feature gate `PerVPAConfig` will cause the following to happen:
155+
156+
- Any per-VPA configuration parameters specified in VPA objects will be ignored
157+
- Components will fall back to using their global configuration values
158+
159+
Enabling of feature gate `PerVPAConfig` will cause the following to happen:
160+
161+
- VPA components will honor the per-VPA configuration parameters specified in VPA objects
162+
- Validation will be performed on the configuration parameters
163+
- Configuration parameters will override global defaults for the specific VPA object
164+
165+
### Kubernetes version compatibility
166+
167+
The `PerVPAConfig` feature requires VPA version 1.5.0 or higher. The feature is being introduced as alpha and will follow the standard Kubernetes feature gate graduation process:
168+
- Alpha: v1.5.0 (default off)
169+
- Beta: TBD (default on)
170+
- GA: TBD (default on)
171+
172+
### Validation via CEL and Testing
173+
174+
Initial validation rules (CEL):
175+
* `oomMinBumpUp` > 0
176+
* `memoryAggregationInterval` > 0
177+
* `evictAfterOOMThreshold` > 0
178+
* `memoryAggregationIntervalCount` > 0
179+
180+
Validation via Admission Controller:
181+
Some components cann't be validated using Common Expression Language (CEL). This validation is performed within the admission controller.
182+
183+
* `oomBumpUpRatio` – Using Kubernetes Quantity type for validation. The value must be greater than 1.
184+
185+
Additional validation rules will be added as new parameters are introduced.
186+
E2E tests will be included to verify:
187+
* Different configurations are properly applied and respected by VPA components
188+
* VPA behavior matches expected outcomes for different parameter combinations
189+
* Proper fallback to global configuration when parameters are not specified
190+
191+
### Test Plan
192+
193+
- Unit tests for new API fields and validation
194+
- Integration tests verifying different configurations are properly applied
195+
- E2E tests comparing behavior with different configurations
196+
- Upgrade tests ensuring backward compatibility
197+
198+
## Implementation History
199+
200+
- 2025-04-12: Initial proposal
201+
- Future: Additional parameters will be added based on user feedback and requirements
202+
203+
## Future Work
204+
205+
This enhancement is designed to be extensible. As the VPA evolves and users provide feedback, additional parameters may be added to the per-object configuration. Each new parameter will:
206+
1. Be documented in this AEP
207+
2. Include appropriate validation rules
208+
3. Maintain backward compatibility
209+
4. Follow the same pattern of falling back to global configuration when not specified
210+
211+
The decision to add new parameters will be based on:
212+
- User feedback and requirements
213+
- Performance impact analysis
214+
- Implementation complexity
215+
- Maintenance considerations
216+
217+
## Alternatives
218+
219+
### Multiple VPA Deployments
220+
221+
Continue with current approach of running multiple VPA deployments with different configurations:
222+
- Pros: No API changes needed
223+
- Cons: Higher resource usage, operational complexity
224+
225+
### Environment-Specific Configuration
226+
227+
Use different VPA deployments per environment (dev/staging/prod):
228+
- Pros: Simpler than per-workload configuration
229+
- Cons: Less flexible, doesn't address varying needs within same environment

0 commit comments

Comments
 (0)