Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/kubernetes-operator/crds.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,8 @@ spec:

A [WorkloadDeployment](./api-reference.md#workloaddeployment) defines the deployment and scaling of Workloads across hosts. It creates and manages WorkloadReplicaSets to ensure the desired number of workload replicas are running.

`WorkloadDeployment` implements the Kubernetes `/scale` subresource, so `kubectl scale`, the Horizontal Pod Autoscaler, and KEDA all work against it without wasmCloud-specific glue. `spec.replicas` defaults to `1` when omitted. See [Autoscaling](./operator-manual/autoscaling.mdx) for HPA and KEDA examples.

Example manifest:

```yaml
Expand Down
2 changes: 1 addition & 1 deletion docs/kubernetes-operator/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -224,7 +224,7 @@ This manifest deploys a simple "Hello world!" component that uses the `wasi:http
For a kind-optimized NodePort variant that answers `curl localhost` on port 80, see the [wasmCloud-hosted hello-world manifest](https://raw.githubusercontent.com/wasmCloud/wasmCloud/refs/heads/main/templates/http-hello-world/manifests/workloaddeployment.yaml) used by the [Installation](../installation.mdx#deploy-a-wasm-workload) guide.

:::note[]
Learn more about `WorkloadDeployments` and other wasmCloud resources in the [Custom Resource Definitions (CRDs) section](./crds.mdx). For the full Service-routing walk-through, see [Expose a Workload via Kubernetes Service](../recipes/expose-workload-via-kubernetes-service.mdx).
Learn more about `WorkloadDeployments` and other wasmCloud resources in the [Custom Resource Definitions (CRDs) section](./crds.mdx). For the full Service-routing walk-through, see [Expose a Workload via Kubernetes Service](../recipes/expose-workload-via-kubernetes-service.mdx). To scale a `WorkloadDeployment` with HPA or KEDA, see [Autoscaling](./operator-manual/autoscaling.mdx).
:::

Verify the component is reachable from inside the cluster:
Expand Down
149 changes: 149 additions & 0 deletions docs/kubernetes-operator/operator-manual/autoscaling.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
---
title: 'Autoscaling WorkloadDeployments with HPA and KEDA'
sidebar_label: 'Autoscaling'
sidebar_position: 3
description: 'Scale wasmCloud WorkloadDeployments using the Kubernetes Horizontal Pod Autoscaler or KEDA via the standard /scale subresource.'
proficiency: Intermediate
# `about` (single primary topic) and `mentions` (secondary topics) reference
# entity slugs in src/data/entities.json. At build time these expand into
# schema.org JSON-LD nodes — emitted as @id refs on the page's Article-family
# node plus inlined Thing entries in the same @graph payload. Slugs are
# case-sensitive. Transcript pages inherit these refs from their parent
# landing page via src/data/transcript-inheritance.json (regenerated by
# scripts/generate-transcript-inheritance.mjs / the prebuild hook), so
# don't repeat the block in transcript frontmatter.
about: wasmCloud
mentions:
- Kubernetes
- HorizontalPodAutoscaler
- KEDA
- OpenTelemetry
- Prometheus
platforms: [wasmCloud, Kubernetes]
---

# Autoscaling

The [`WorkloadDeployment`](/docs/kubernetes-operator/api-reference.md#workloaddeployment) resource implements the standard Kubernetes [`/scale` subresource](https://kubernetes.io/docs/reference/using-api/api-concepts/#scale-subresource).

That means a wasmCloud `WorkloadDeployment` can be scaled like any other resource using [`kubectl scale`](#imperative-scaling), the [Horizontal Pod Autoscaler (HPA)](#horizontal-pod-autoscaler-hpa), or [KEDA](#kubernetes-event-driven-autoscaling-keda).

## What scales

Autoscaling changes the **number of component instances** running across the host group. Hosts are a separate pool managed by the operator's host group `Deployment`; scaling a `WorkloadDeployment` schedules more instances of the component onto existing hosts, up to each host's pool size.

This has two practical consequences:

- **The host group is a precondition for autoscaling.** If you set `maxReplicas: 100` but the host group only has capacity for 30 component instances, HPA will hold at 30 and surface the cap in its status. Scale the host group (`kubectl scale deployment hostgroup-default -n wasmcloud --replicas=N`, or via the `runtime.hostGroups[].replicas` Helm value) ahead of expected demand, or autoscale the host group separately.
- **Pod- and resource-based HPA metrics do not apply.** wasmCloud components are not pods, so HPA's built-in `Resource` (CPU/memory) and `Pods` metric types have nothing to read. Use `External` or `Object` metrics (typically delivered via the Prometheus Adapter or a KEDA scaler).

## Imperative scaling

You can set replicas directly using standard tooling such as `kubectl`:

```shell
kubectl scale workloaddeployment hello --replicas=5
```

`WorkloadDeployment.spec.replicas` defaults to `1` (matching native `Deployment` semantics), so the field can be omitted from manifests where one replica is the desired baseline.

## Horizontal Pod Autoscaler (HPA)

A complete [Horizontal Pod Autoscaler (HPA)](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/) manifest targets a `WorkloadDeployment`:

```yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: hello
spec:
scaleTargetRef:
apiVersion: runtime.wasmcloud.dev/v1alpha1
kind: WorkloadDeployment
name: hello
minReplicas: 1
maxReplicas: 100
metrics:
- type: External
external:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: '10'
behavior:
scaleDown:
stabilizationWindowSeconds: 60
```

The example above scales on an external `http_requests_per_second` metric exposed through the [Prometheus Adapter](https://github.com/kubernetes-sigs/prometheus-adapter) or an equivalent External Metrics API provider. Any source the External Metrics API can read works; the metric simply needs to live outside the workload pods, since wasmCloud workloads aren't pods.

## Kubernetes Event-Driven Autoscaling (KEDA)

[KEDA](https://keda.sh/) is a CNCF Graduated project that serves as an event-driven autoscaler. KEDA's `ScaledObject` wraps the same `/scale` subresource and adds a large catalog of [scalers](https://keda.sh/docs/latest/scalers/) (Prometheus, Kafka, NATS, OTel, cloud queue services, and so on). Behind the scenes, KEDA creates and manages an HPA on your behalf.
Comment thread
ericgregory marked this conversation as resolved.

A path that works particularly well with wasmCloud is **scaling on a metric emitted by the workload itself**: a Wasm component exporting OpenTelemetry from its handlers becomes its own scale signal.

```yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: hello
spec:
scaleTargetRef:
apiVersion: runtime.wasmcloud.dev/v1alpha1
kind: WorkloadDeployment
name: hello
minReplicaCount: 1
maxReplicaCount: 100
pollingInterval: 15
cooldownPeriod: 60
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus.monitoring.svc.cluster.local:9090
query: sum(rate(http_server_requests_total{workload="hello"}[1m]))
threshold: '10'
```

The component itself doesn't need to be aware of KEDA: it emits standard OpenTelemetry metrics, Prometheus (or another OTel collector) scrapes them, and KEDA closes the loop.

## Selector label

The operator stamps a managed label on every `WorkloadReplicaSet` it creates:

```text
runtime.wasmcloud.dev/workload-deployment=<deployment-name>
```

This label backs the `/scale` subresource's selector field. You can use it from `kubectl` to list a deployment's replica sets:

```shell
kubectl get workloadreplicaset -l runtime.wasmcloud.dev/workload-deployment=hello
```

Do not edit or remove the label by hand; the operator owns it and rewrites it on every reconcile.

## Status fields

For visibility into what HPA or KEDA observes, `WorkloadDeployment.status` exposes two scale-subresource fields:

| Field | Type | Description |
|---|---|---|
| `currentReplicas` | `int32` | Flat replica count read by HPA via `statusReplicasPath`. Mirrors `.status.replicas.current` as a scalar. |
| `selector` | `string` | Serialized label selector read by HPA via `labelSelectorPath`. Populated even during a fresh deploy so HPA never sees a missing selector mid-rollout. |

Both fields are operator-managed; they are not part of the user-authored spec.

## When autoscaling isn't the right answer

A few cases where `WorkloadDeployment` autoscaling is a worse fit than alternatives:

- **Bursty, short-lived traffic on a static host group.** Each new component instance starts in milliseconds, but the value of autoscaling drops if the host group is already large enough to handle peaks. A higher per-component `poolSize` can absorb burst without any controller in the loop.
- **Scale-to-zero.** Setting `minReplicas: 0` is supported, but only the workload scales to zero&mdash;the host group keeps running. If you need true zero-cost idle, scale the host group separately or shut down the deployment entirely.
- **Tight latency targets where cold-start of a new instance matters.** Even at sub-millisecond instantiation, a scale-up step that lands on a p99-sensitive request is visible; pre-warming via a higher `minReplicas` is usually cheaper than chasing a metric.

## Related

- [CRDs reference — WorkloadDeployment](../crds.mdx#workloaddeployment)
- [Operator Overview](./overview.mdx)
1 change: 1 addition & 0 deletions docs/kubernetes-operator/operator-manual/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ The wasmCloud operator (`runtime-operator`) is the control-plane entity that wat

- **Watching CRDs**: Monitors `WorkloadDeployment`, `WorkloadReplicaSet`, `Workload`, `Host`, and `Artifact` resources across all (or configured) namespaces.
- **Scheduling workloads**: Reads the `WorkloadDeployment` spec and selects a `Host` that matches the `hostSelector` criteria, then creates the appropriate child resources to run the Wasm component.
- **Scale-subresource support**: `WorkloadDeployment` implements the standard Kubernetes `/scale` subresource, so `kubectl scale`, HPA, and KEDA scale workloads natively. See [Autoscaling](./autoscaling.mdx) for HPA and KEDA examples.
- **EndpointSlice management**: For workloads that reference a Kubernetes Service via `spec.kubernetes.service.name`, the operator creates and maintains an EndpointSlice pointing to the pod IPs of the hosts running the workload. It also registers Service DNS aliases with the host's HTTP router so requests arriving via cluster DNS reach the correct component.
- **Host communication**: Sends workload start/stop requests and polls host health over NATS, using the `runtime.host.<hostID>.<operation>` subject pattern (e.g. `runtime.host.<hostID>.workload.start`).
- **Status reporting**: Updates the `status` subresource of each CRD to reflect whether scheduling succeeded or failed, and surfaces Kubernetes events for observability.
Expand Down
1 change: 1 addition & 0 deletions sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,7 @@ const sidebars = {
'kubernetes-operator/operator-manual/overview',
'kubernetes-operator/operator-manual/helm-values',
'kubernetes-operator/operator-manual/cicd',
'kubernetes-operator/operator-manual/autoscaling',
'kubernetes-operator/operator-manual/roles-and-rolebindings',
'kubernetes-operator/operator-manual/secrets-and-configuration',
'kubernetes-operator/operator-manual/private-registries',
Expand Down
35 changes: 35 additions & 0 deletions src/data/entities.json
Original file line number Diff line number Diff line change
Expand Up @@ -311,6 +311,41 @@
},
"isAccessibleForFree": true
},
"KEDA": {
"@id": "https://wasmcloud.com/#entity-keda",
"@type": "SoftwareApplication",
"name": "KEDA",
"alternateName": [
"Kubernetes Event-driven Autoscaling"
],
"description": "CNCF event-driven autoscaler for Kubernetes. Extends the Horizontal Pod Autoscaler with external metric scalers, enabling workloads to scale on signals such as queue depth, request rate, or custom Prometheus metrics. wasmCloud WorkloadDeployments expose the standard /scale subresource, so KEDA scales them via ScaledObject without wasmCloud-specific glue.",
"sameAs": [
"https://keda.sh/",
"https://en.wikipedia.org/wiki/KEDA"
],
"applicationCategory": "UtilitiesApplication",
"operatingSystem": "Cross-platform",
"offers": {
"@type": "Offer",
"price": "0",
"priceCurrency": "USD",
"availability": "https://schema.org/InStock"
},
"isAccessibleForFree": true
},
"HorizontalPodAutoscaler": {
"@id": "https://wasmcloud.com/#entity-horizontal-pod-autoscaler",
"@type": "Thing",
"name": "Horizontal Pod Autoscaler",
"alternateName": [
"HPA",
"Kubernetes HPA"
],
"description": "The native Kubernetes controller that scales workloads by reading the /scale subresource of a target resource. wasmCloud WorkloadDeployments implement the /scale subresource and are scaled directly by HPA (and, by extension, KEDA, which builds on HPA).",
"sameAs": [
"https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/"
]
},
"NATS": {
"@id": "https://wasmcloud.com/#entity-nats",
"@type": "SoftwareApplication",
Expand Down
Loading