Describe the bug
When a Canary uses autoscalerRef.kind: ScaledObject, Flagger creates the <name>-primary ScaledObject by deep-copying labels and annotations from the source ScaledObject (pkg/canary/scaled_object_reconciler.go:84). At the moment of creation, the source has autoscaling.keda.sh/paused-replicas: "0" set by PauseTargetScaler, because Flagger pauses the source as part of canary initialization. That annotation is copied verbatim onto the new -primary ScaledObject.
ResumeTargetScaler only operates on cd.Spec.AutoscalerRef.Name (the source), so the annotation is never cleared from the -primary copy. KEDA refuses to create an underlying HPA while a ScaledObject is paused, so the primary deployment is stuck at the paused replica count (0) and never autoscales off its triggers. From Flagger's perspective everything looks healthy: Phase: Succeeded, Promoted: True.
The bug recurs every time the -primary ScaledObject is recreated while the source is paused: namespace teardown and redeploy, manual delete of the SO, Canary CR delete/recreate, etc.
To Reproduce
- Deployment
podinfo, plus a ScaledObject named podinfo targeting it.
- Canary
podinfo with provider: kubernetes, targetRef pointing to the deployment, and autoscalerRef.kind: ScaledObject pointing to podinfo.
- Wait for the initial promotion to complete.
kubectl get scaledobject podinfo-primary -o yaml.
Observed:
metadata:
annotations:
autoscaling.keda.sh/paused-replicas: "0"
status:
conditions:
- type: Paused
status: "True"
reason: ScaledObjectPaused
No keda-hpa-podinfo-primary HPA exists, and podinfo-primary does not scale off its triggers. kubectl get scaledobject podinfo-primary --show-managed-fields -o yaml confirms flagger is the manager that wrote the annotation, set once at creation time and never updated since:
managedFields:
- manager: flagger
operation: Update
time: "<canary init time>"
fieldsV1:
f:metadata:
f:annotations:
f:autoscaling.keda.sh/paused-replicas: {}
Expected behavior
The -primary ScaledObject should not carry autoscaling.keda.sh/paused-replicas. A few possible fixes:
- Strip the annotation from the annotations map passed into
makeObjectMetaSo when creating the primary.
- Or explicitly remove it from the primary in
reconcilePrimaryScaler after creation.
- Or apply the same
includeLabelsByPrefix filter on init that already runs on the update path (scaled_object_reconciler.go:110-113), which drops the annotation when no --include-label-prefix matches it.
Additional context
- Flagger version: 1.43.0 (the same code path exists on
main)
- KEDA version: 2.x
- Kubernetes version: 1.34
- Service Mesh provider: none (kubernetes provider, blue/green)
- Ingress provider: n/a for this canary
Code reference: https://github.com/fluxcd/flagger/blob/v1.43.0/pkg/canary/scaled_object_reconciler.go#L84
A CronJob that strips autoscaling.keda.sh/paused-replicas from any *-primary ScaledObject owned by a Flagger Canary is a viable workaround until this is fixed upstream.
Describe the bug
When a Canary uses
autoscalerRef.kind: ScaledObject, Flagger creates the<name>-primaryScaledObject by deep-copying labels and annotations from the source ScaledObject (pkg/canary/scaled_object_reconciler.go:84). At the moment of creation, the source hasautoscaling.keda.sh/paused-replicas: "0"set byPauseTargetScaler, because Flagger pauses the source as part of canary initialization. That annotation is copied verbatim onto the new-primaryScaledObject.ResumeTargetScaleronly operates oncd.Spec.AutoscalerRef.Name(the source), so the annotation is never cleared from the-primarycopy. KEDA refuses to create an underlying HPA while a ScaledObject is paused, so the primary deployment is stuck at the paused replica count (0) and never autoscales off its triggers. From Flagger's perspective everything looks healthy:Phase: Succeeded,Promoted: True.The bug recurs every time the
-primaryScaledObject is recreated while the source is paused: namespace teardown and redeploy, manual delete of the SO, Canary CR delete/recreate, etc.To Reproduce
podinfo, plus aScaledObjectnamedpodinfotargeting it.podinfowithprovider: kubernetes,targetRefpointing to the deployment, andautoscalerRef.kind: ScaledObjectpointing topodinfo.kubectl get scaledobject podinfo-primary -o yaml.Observed:
No
keda-hpa-podinfo-primaryHPA exists, andpodinfo-primarydoes not scale off its triggers.kubectl get scaledobject podinfo-primary --show-managed-fields -o yamlconfirmsflaggeris the manager that wrote the annotation, set once at creation time and never updated since:Expected behavior
The
-primaryScaledObject should not carryautoscaling.keda.sh/paused-replicas. A few possible fixes:makeObjectMetaSowhen creating the primary.reconcilePrimaryScalerafter creation.includeLabelsByPrefixfilter on init that already runs on the update path (scaled_object_reconciler.go:110-113), which drops the annotation when no--include-label-prefixmatches it.Additional context
main)Code reference: https://github.com/fluxcd/flagger/blob/v1.43.0/pkg/canary/scaled_object_reconciler.go#L84
A CronJob that strips
autoscaling.keda.sh/paused-replicasfrom any*-primaryScaledObject owned by a FlaggerCanaryis a viable workaround until this is fixed upstream.