-
Notifications
You must be signed in to change notification settings - Fork 69
docs(kubernetes-operator): document HPA + KEDA autoscaling #1196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
149 changes: 149 additions & 0 deletions
149
docs/kubernetes-operator/operator-manual/autoscaling.mdx
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,149 @@ | ||
| --- | ||
| title: 'Autoscaling WorkloadDeployments with HPA and KEDA' | ||
| sidebar_label: 'Autoscaling' | ||
| sidebar_position: 3 | ||
| description: 'Scale wasmCloud WorkloadDeployments using the Kubernetes Horizontal Pod Autoscaler or KEDA via the standard /scale subresource.' | ||
| proficiency: Intermediate | ||
| # `about` (single primary topic) and `mentions` (secondary topics) reference | ||
| # entity slugs in src/data/entities.json. At build time these expand into | ||
| # schema.org JSON-LD nodes — emitted as @id refs on the page's Article-family | ||
| # node plus inlined Thing entries in the same @graph payload. Slugs are | ||
| # case-sensitive. Transcript pages inherit these refs from their parent | ||
| # landing page via src/data/transcript-inheritance.json (regenerated by | ||
| # scripts/generate-transcript-inheritance.mjs / the prebuild hook), so | ||
| # don't repeat the block in transcript frontmatter. | ||
| about: wasmCloud | ||
| mentions: | ||
| - Kubernetes | ||
| - HorizontalPodAutoscaler | ||
| - KEDA | ||
| - OpenTelemetry | ||
| - Prometheus | ||
| platforms: [wasmCloud, Kubernetes] | ||
| --- | ||
|
|
||
| # Autoscaling | ||
|
|
||
| The [`WorkloadDeployment`](/docs/kubernetes-operator/api-reference.md#workloaddeployment) resource implements the standard Kubernetes [`/scale` subresource](https://kubernetes.io/docs/reference/using-api/api-concepts/#scale-subresource). | ||
|
|
||
| That means a wasmCloud `WorkloadDeployment` can be scaled like any other resource using [`kubectl scale`](#imperative-scaling), the [Horizontal Pod Autoscaler (HPA)](#horizontal-pod-autoscaler-hpa), or [KEDA](#kubernetes-event-driven-autoscaling-keda). | ||
|
|
||
| ## What scales | ||
|
|
||
| Autoscaling changes the **number of component instances** running across the host group. Hosts are a separate pool managed by the operator's host group `Deployment`; scaling a `WorkloadDeployment` schedules more instances of the component onto existing hosts, up to each host's pool size. | ||
|
|
||
| This has two practical consequences: | ||
|
|
||
| - **The host group is a precondition for autoscaling.** If you set `maxReplicas: 100` but the host group only has capacity for 30 component instances, HPA will hold at 30 and surface the cap in its status. Scale the host group (`kubectl scale deployment hostgroup-default -n wasmcloud --replicas=N`, or via the `runtime.hostGroups[].replicas` Helm value) ahead of expected demand, or autoscale the host group separately. | ||
| - **Pod- and resource-based HPA metrics do not apply.** wasmCloud components are not pods, so HPA's built-in `Resource` (CPU/memory) and `Pods` metric types have nothing to read. Use `External` or `Object` metrics (typically delivered via the Prometheus Adapter or a KEDA scaler). | ||
|
|
||
| ## Imperative scaling | ||
|
|
||
| You can set replicas directly using standard tooling such as `kubectl`: | ||
|
|
||
| ```shell | ||
| kubectl scale workloaddeployment hello --replicas=5 | ||
| ``` | ||
|
|
||
| `WorkloadDeployment.spec.replicas` defaults to `1` (matching native `Deployment` semantics), so the field can be omitted from manifests where one replica is the desired baseline. | ||
|
|
||
| ## Horizontal Pod Autoscaler (HPA) | ||
|
|
||
| A complete [Horizontal Pod Autoscaler (HPA)](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/) manifest targets a `WorkloadDeployment`: | ||
|
|
||
| ```yaml | ||
| apiVersion: autoscaling/v2 | ||
| kind: HorizontalPodAutoscaler | ||
| metadata: | ||
| name: hello | ||
| spec: | ||
| scaleTargetRef: | ||
| apiVersion: runtime.wasmcloud.dev/v1alpha1 | ||
| kind: WorkloadDeployment | ||
| name: hello | ||
| minReplicas: 1 | ||
| maxReplicas: 100 | ||
| metrics: | ||
| - type: External | ||
| external: | ||
| metric: | ||
| name: http_requests_per_second | ||
| target: | ||
| type: AverageValue | ||
| averageValue: '10' | ||
| behavior: | ||
| scaleDown: | ||
| stabilizationWindowSeconds: 60 | ||
| ``` | ||
|
|
||
| The example above scales on an external `http_requests_per_second` metric exposed through the [Prometheus Adapter](https://github.com/kubernetes-sigs/prometheus-adapter) or an equivalent External Metrics API provider. Any source the External Metrics API can read works; the metric simply needs to live outside the workload pods, since wasmCloud workloads aren't pods. | ||
|
|
||
| ## Kubernetes Event-Driven Autoscaling (KEDA) | ||
|
|
||
| [KEDA](https://keda.sh/) is a CNCF Graduated project that serves as an event-driven autoscaler. KEDA's `ScaledObject` wraps the same `/scale` subresource and adds a large catalog of [scalers](https://keda.sh/docs/latest/scalers/) (Prometheus, Kafka, NATS, OTel, cloud queue services, and so on). Behind the scenes, KEDA creates and manages an HPA on your behalf. | ||
|
|
||
| A path that works particularly well with wasmCloud is **scaling on a metric emitted by the workload itself**: a Wasm component exporting OpenTelemetry from its handlers becomes its own scale signal. | ||
|
|
||
| ```yaml | ||
| apiVersion: keda.sh/v1alpha1 | ||
| kind: ScaledObject | ||
| metadata: | ||
| name: hello | ||
| spec: | ||
| scaleTargetRef: | ||
| apiVersion: runtime.wasmcloud.dev/v1alpha1 | ||
| kind: WorkloadDeployment | ||
| name: hello | ||
| minReplicaCount: 1 | ||
| maxReplicaCount: 100 | ||
| pollingInterval: 15 | ||
| cooldownPeriod: 60 | ||
| triggers: | ||
| - type: prometheus | ||
| metadata: | ||
| serverAddress: http://prometheus.monitoring.svc.cluster.local:9090 | ||
| query: sum(rate(http_server_requests_total{workload="hello"}[1m])) | ||
| threshold: '10' | ||
| ``` | ||
|
|
||
| The component itself doesn't need to be aware of KEDA: it emits standard OpenTelemetry metrics, Prometheus (or another OTel collector) scrapes them, and KEDA closes the loop. | ||
|
|
||
| ## Selector label | ||
|
|
||
| The operator stamps a managed label on every `WorkloadReplicaSet` it creates: | ||
|
|
||
| ```text | ||
| runtime.wasmcloud.dev/workload-deployment=<deployment-name> | ||
| ``` | ||
|
|
||
| This label backs the `/scale` subresource's selector field. You can use it from `kubectl` to list a deployment's replica sets: | ||
|
|
||
| ```shell | ||
| kubectl get workloadreplicaset -l runtime.wasmcloud.dev/workload-deployment=hello | ||
| ``` | ||
|
|
||
| Do not edit or remove the label by hand; the operator owns it and rewrites it on every reconcile. | ||
|
|
||
| ## Status fields | ||
|
|
||
| For visibility into what HPA or KEDA observes, `WorkloadDeployment.status` exposes two scale-subresource fields: | ||
|
|
||
| | Field | Type | Description | | ||
| |---|---|---| | ||
| | `currentReplicas` | `int32` | Flat replica count read by HPA via `statusReplicasPath`. Mirrors `.status.replicas.current` as a scalar. | | ||
| | `selector` | `string` | Serialized label selector read by HPA via `labelSelectorPath`. Populated even during a fresh deploy so HPA never sees a missing selector mid-rollout. | | ||
|
|
||
| Both fields are operator-managed; they are not part of the user-authored spec. | ||
|
|
||
| ## When autoscaling isn't the right answer | ||
|
|
||
| A few cases where `WorkloadDeployment` autoscaling is a worse fit than alternatives: | ||
|
|
||
| - **Bursty, short-lived traffic on a static host group.** Each new component instance starts in milliseconds, but the value of autoscaling drops if the host group is already large enough to handle peaks. A higher per-component `poolSize` can absorb burst without any controller in the loop. | ||
| - **Scale-to-zero.** Setting `minReplicas: 0` is supported, but only the workload scales to zero—the host group keeps running. If you need true zero-cost idle, scale the host group separately or shut down the deployment entirely. | ||
| - **Tight latency targets where cold-start of a new instance matters.** Even at sub-millisecond instantiation, a scale-up step that lands on a p99-sensitive request is visible; pre-warming via a higher `minReplicas` is usually cheaper than chasing a metric. | ||
|
|
||
| ## Related | ||
|
|
||
| - [CRDs reference — WorkloadDeployment](../crds.mdx#workloaddeployment) | ||
| - [Operator Overview](./overview.mdx) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.