Skip to content

Primary ScaledObject stuck paused after init when autoscalerRef is a ScaledObject #1934

Description

@TruepicDustin

Describe the bug

When a Canary uses autoscalerRef.kind: ScaledObject, Flagger creates the <name>-primary ScaledObject by deep-copying labels and annotations from the source ScaledObject (pkg/canary/scaled_object_reconciler.go:84). At the moment of creation, the source has autoscaling.keda.sh/paused-replicas: "0" set by PauseTargetScaler, because Flagger pauses the source as part of canary initialization. That annotation is copied verbatim onto the new -primary ScaledObject.

ResumeTargetScaler only operates on cd.Spec.AutoscalerRef.Name (the source), so the annotation is never cleared from the -primary copy. KEDA refuses to create an underlying HPA while a ScaledObject is paused, so the primary deployment is stuck at the paused replica count (0) and never autoscales off its triggers. From Flagger's perspective everything looks healthy: Phase: Succeeded, Promoted: True.

The bug recurs every time the -primary ScaledObject is recreated while the source is paused: namespace teardown and redeploy, manual delete of the SO, Canary CR delete/recreate, etc.

To Reproduce

  1. Deployment podinfo, plus a ScaledObject named podinfo targeting it.
  2. Canary podinfo with provider: kubernetes, targetRef pointing to the deployment, and autoscalerRef.kind: ScaledObject pointing to podinfo.
  3. Wait for the initial promotion to complete.
  4. kubectl get scaledobject podinfo-primary -o yaml.

Observed:

metadata:
  annotations:
    autoscaling.keda.sh/paused-replicas: "0"
status:
  conditions:
    - type: Paused
      status: "True"
      reason: ScaledObjectPaused

No keda-hpa-podinfo-primary HPA exists, and podinfo-primary does not scale off its triggers. kubectl get scaledobject podinfo-primary --show-managed-fields -o yaml confirms flagger is the manager that wrote the annotation, set once at creation time and never updated since:

managedFields:
  - manager: flagger
    operation: Update
    time: "<canary init time>"
    fieldsV1:
      f:metadata:
        f:annotations:
          f:autoscaling.keda.sh/paused-replicas: {}

Expected behavior

The -primary ScaledObject should not carry autoscaling.keda.sh/paused-replicas. A few possible fixes:

  • Strip the annotation from the annotations map passed into makeObjectMetaSo when creating the primary.
  • Or explicitly remove it from the primary in reconcilePrimaryScaler after creation.
  • Or apply the same includeLabelsByPrefix filter on init that already runs on the update path (scaled_object_reconciler.go:110-113), which drops the annotation when no --include-label-prefix matches it.

Additional context

  • Flagger version: 1.43.0 (the same code path exists on main)
  • KEDA version: 2.x
  • Kubernetes version: 1.34
  • Service Mesh provider: none (kubernetes provider, blue/green)
  • Ingress provider: n/a for this canary

Code reference: https://github.com/fluxcd/flagger/blob/v1.43.0/pkg/canary/scaled_object_reconciler.go#L84

A CronJob that strips autoscaling.keda.sh/paused-replicas from any *-primary ScaledObject owned by a Flagger Canary is a viable workaround until this is fixed upstream.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions