-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[WIP]OSDOCS-19990: Resource fair sharing #113457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,29 @@ | ||
| :_mod-docs-content-type: ASSEMBLY | ||
| include::_attributes/common-attributes.adoc[] | ||
| [id="admission-fair-sharing"] | ||
| = Admission fair sharing | ||
| :context: admission-fair-sharing | ||
|
|
||
| toc::[] | ||
|
|
||
| [role="_abstract"] | ||
| Use admission fair sharing to fairly distribute workloads across local Queues that share a single `ClusterQueue`. | ||
| This feature balances workload admission by prioritizing workloads from local Queues that have used fewer resources historically. It tracks usage over time with a configurable decay function and applies admission penalties when workloads are admitted. | ||
|
|
||
| When multiple tenants share a single `ClusterQueue`, some tenants risk resource starvation. Admission fair sharing adresses this issue by meeting the following requirements: | ||
|
|
||
| Enforce multi-tenant fairness (business critical):: Ensure fair distribution of cluster resources across all tenants based on their usage history. | ||
|
|
||
| Improve service predictability:: Guarantee each tenant gets a consistent share of resources, reducing latency spikes and preventing starvation. | ||
|
|
||
| Enable scalable governance:: Complement static quotas with dynamic, usage-based admission ordering that adapts as tenant demand changes. | ||
|
|
||
| include::modules/kueue-configuring-kueue-instance-for-admission-fair-sharing.adoc[leveloffset=+1] | ||
|
|
||
| include::modules/kueue-configuring-clusterqueue-for-admission-fair-sharing.adoc[leveloffset=+1] | ||
|
|
||
| include::modules/kueue-configuring-localqueue-for-admission-fair-sharing.adoc[leveloffset=+1] | ||
|
|
||
| include::modules/kueue-setting-resource-weights.adoc[leveloffset=+1] | ||
|
|
||
| include::modules/kueue-verifying-the-admission-fair-sharing-status.adoc[leveloffset=+1] |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| // Module included in the following assemblies: | ||
| // | ||
| // * ai_workloads/kueue/admission-fair-sharing.adoc | ||
|
|
||
| :_mod-docs-content-type: PROCEDURE | ||
| [id="configuring-clusterqueue-for-admission-fair-sharing_{context}"] | ||
| = Configuring a cluster queue for admission fair sharing | ||
|
|
||
| [role="_abstract"] | ||
| Configure the `admissionScope` section in your `ClusterQueue` object to be `UsageBasedAdmissionFairSharing`. | ||
|
|
||
| .Procedure | ||
|
|
||
| * Specify `UsageBasedAdmissionFairSharing` as shown in the following example: | ||
| + | ||
| [source,yaml] | ||
| ---- | ||
| admissionScope: | ||
| admissionMode: UsageBasedAdmissionFairSharing | ||
| ---- | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,66 @@ | ||
| // Module included in the following assemblies: | ||
| // | ||
| // * ai_workloads/kueue/admission-fair-sharing.adoc | ||
|
|
||
| :_mod-docs-content-type: PROCEDURE | ||
| [id="configuring-kueue-instance-for-admission-fair-sharing_{context}"] | ||
| = Configuring the {kueue-name} instance for admission fair sharing | ||
|
|
||
| [role="_abstract"] | ||
| Configure {kueue-name} admission fair sharing using either the `Default` or `Custom` configuration. The Default configuration uses predefined {kueue-name} values. | ||
|
|
||
| .Procedure | ||
|
|
||
| . Choose the `configuration` type you want to use: | ||
| + | ||
| * `Default`: Uses {kueue-name} predefined values. | ||
| * `Custom`: Uses {kueue-name} values that you specify. | ||
|
|
||
| . Apply your chosen configuration: | ||
| + | ||
| * Use the following command to create a `Default` configuration: | ||
| + | ||
| [source,terminal] | ||
| ---- | ||
| $ oc patch kueue.kueue.openshift.io/cluster --type=merge -p \ | ||
| '{"spec":{"config":{"admissionFairSharing":{"configuration":"Default"}}}}' | ||
| ---- | ||
| + | ||
| .Example output | ||
| [source,yaml] | ||
| ---- | ||
| config: | ||
| admissionFairSharing: | ||
| configuration: Default | ||
| ---- | ||
| + | ||
| * Use the following command to create a `Custom` configuration that applies values that you specify: | ||
| + | ||
| [source,terminal] | ||
| ---- | ||
| $ oc patch kueue.kueue.openshift.io/cluster --type=merge -p \ | ||
| '{"spec":{"config":{"admissionFairSharing":{"configuration":"Custom","custom":{"usageHalfLifeTimeSeconds":10,"usageSamplingIntervalSeconds":10,"resourceWeights":[{"name":"cpu","weight":"2.0"}]}}}}}' | ||
| ---- | ||
| + | ||
| .Example output | ||
| [source,yaml] | ||
| ---- | ||
| config: | ||
| admissionFairSharing: | ||
| configuration: Custom | ||
| custom: | ||
| resourceWeights: | ||
| - name: cpu | ||
| weight: "2.0" | ||
| usageHalfLifeTimeSeconds: 10 | ||
| usageSamplingIntervalSeconds: 10 | ||
| ---- | ||
| + | ||
| `resourceWeights`:: Assigns weights to resources. The higher the weight, the higher the penalty. | ||
| `usageHalfLifeTimeSeconds`:: The time in seconds after which the current usage will decrease by half. That is, it controls how long the past consumption should impact future admission. | ||
|
|
||
| `usageSamplingIntervalSeconds`:: The frequency in seconds that {kueue-name} updates consumedResources in FairSharingStatus. | ||
|
|
||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| // Module included in the following assemblies: | ||
| // | ||
| // * ai_workloads/kueue/admission-fair-sharing.adoc | ||
|
|
||
| :_mod-docs-content-type: PROCEDURE | ||
| [id="configuring-localqueue-for-admission-fair-sharing_{context}"] | ||
| = Configuring a local queue for admission fair sharing (optional) | ||
|
|
||
| [role="_abstract"] | ||
| Optionally, you can configure `fairSharing` section in your `LocalQueue` object to adjust its weight in the fair sharing calculation. The higher the weight, the lower the penalty. For example, specifying a weight of `2` treats the queue as if it is used by half as many resources. | ||
|
|
||
| .Procedure | ||
|
|
||
| * Specify a `weight` value as shown in the following example: | ||
| + | ||
| [source,yaml] | ||
| ---- | ||
| spec: | ||
| fairSharing: | ||
| weight: "2" | ||
| ---- |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,22 @@ | ||
| / Module included in the following assemblies: | ||
| // | ||
| // * ai_workloads/kueue/release-notes.adoc | ||
|
|
||
| :_mod-docs-content-type: REFERENCE | ||
| [id="release-notes-1.4_{context}"] | ||
| = Release notes for {kueue-name} version 1.4 | ||
|
|
||
| [role="_abstract"] | ||
| {kueue-name} version 1.4 is a generally available release that is supported on {product-title} versions 4.18 and later. {kueue-name} version 1.4 uses link:https://kueue.sigs.k8s.io/docs/overview/[Kueue] version 0.16. | ||
|
|
||
| [id="release-notes-1.4-new-features_{context}"] | ||
| == New features and enhancements | ||
|
|
||
| Admission fair sharing:: | ||
| This release introduces admission fair sharing, which balances workload admission across multiple local Queues feeding into a shared `ClusterQueue`. Admission fair sharing: | ||
|
|
||
| - Prioritizes workloads based on historical resource consumption | ||
| - Tracks usage over time with a configurable decay function | ||
| - Applies immediate admission penalties to prevent resource monopolization | ||
|
|
||
| For more information, see xref:../../ai_workloads/kueue/admission-fair-sharing.adoc#admission-fair-sharing[Admission fair sharing]. | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🤖 [error] OpenShiftAsciiDoc.NoXrefInModules: Do not include xrefs in modules, only assemblies (exception: release notes modules). |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| // Module included in the following assemblies: | ||
| // | ||
| // * ai_workloads/kueue/admission-fair-sharing.adoc | ||
|
|
||
| :_mod-docs-content-type: CONCEPT | ||
| [id="setting-resource-weights_{context}"] | ||
| = Setting resource weights | ||
|
|
||
| [role="_abstract"] | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will get back to this. |
||
| Resources measured in bytes, like memory, require scaled-down `resourceWeights` values. Kubernetes | ||
| represents memory in bytes, creating values that are billions of times larger than CPU core | ||
| counts. This numeric difference makes CPU weights ineffective unless you scale memory weights | ||
| down. Without this adjustment, the raw byte value of these resources will numerically dominate human-scale resources, such as CPU cores, by several orders of magnitude, effectively making their weights meaningless. | ||
|
|
||
| For example, if you want to achieve an effective memory weight of `1.0`, you would need to instead specify `9.31e-10`, which corresponds to `1.0 / 1,073,741,824`. | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,27 @@ | ||
| // Module included in the following assemblies: | ||
| // | ||
| // * ai_workloads/kueue/admission-fair-sharing.adoc | ||
|
|
||
| :_mod-docs-content-type: PROCEDURE | ||
| [id="verifying-the-admission-fair-sharing-status_{context}"] | ||
| = Verifying the admission fair sharing status | ||
|
|
||
| [role="_abstract"] | ||
| Check the `admissionFairSharingStatus` status in the local queue. | ||
|
|
||
| .Procedure | ||
|
|
||
| * Use the following command to verify the status of admission fair sharing: | ||
| + | ||
| [source,terminal] | ||
| ---- | ||
| $ oc get lq <local-queue-name> -n <local-queue-namespace> -o jsonpath={.status.fairSharing} | ||
| ---- | ||
| + | ||
| .Example output | ||
| [source,terminal] | ||
| ---- | ||
| {"admissionFairSharingStatus":{"consumedResources":{"cpu":"31999m"},"lastUpdate":"2025-06-03T14:25:15Z"},"weightedShare":0} | ||
| ---- | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead we can recommend the user to use the following command to apply the default configuration:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a small correction Stephen. The command would be: