Skip to content

HealthCheckFailed events from kustomize-controller not forwarded to generic-hmac Provider #1303

Description

@agent-true-noise

Description

On a k3s cluster running Flux v2.8.x, HealthCheckFailed events emitted by kustomize-controller are not forwarded to a generic-hmac Provider. The notification-controller logs show the event was received, but the HMAC endpoint never receives the HTTP POST.

Environment

  • Flux: v2.8.x (kustomize-controller v1.8.x, notification-controller v1.8.x)
  • Kubernetes: k3s v1.31.x (HA, 3-node etcd cluster)
  • Provider type: generic-hmac
  • Alert eventSeverity: error
  • Alert eventSources: Kustomization objects by name (wildcard * also tested — same result)

Reproduction steps

  1. Deploy a Kustomization with .spec.healthChecks configured targeting a Deployment.
  2. Configure a generic-hmac Provider pointing to an HMAC endpoint you control (confirmed reachable from the cluster).
  3. Create an Alert with eventSeverity: error and eventSources matching the Kustomization.
  4. Trigger a health-check failure: deploy an image that never becomes Ready within .spec.timeout.
  5. Observe: no HTTP POST arrives at the Provider endpoint. The notification-controller pod logs show the event was received but nothing is forwarded.

Expected behaviour

The Provider endpoint receives an HTTP POST containing the HealthCheckFailed event payload when the Kustomization health check fails.

Actual behaviour

No POST is delivered. kustomize-controller logs show the health-check failure. notification-controller logs show the event was received from the informer but it is silently dropped before the Provider dispatch.

Root cause hypothesis

Based on #165, kustomize-controller emits health-check events with Progressing severity in some code paths. The notification-controller dispatch path silently filters Progressing-severity events before forwarding to any Provider, regardless of the Alert's eventSeverity setting. The error-severity filter on the Alert cannot rescue events that were already filtered by severity before the Alert match.

The 5-minute deduplication window may additionally suppress repeated HealthCheckFailed events where the message is identical (repeated health-check poll cycles), compounding the issue.

Workaround

We have implemented a sidecar bridge service that reads Flux K8s Events directly (bypassing the notification-controller delivery path entirely) to forward deployment status to external systems. This is not a sustainable solution and leaves the notification-controller pipeline unused for health-check events.

Suggested fix

Two options, either would resolve the issue:

  1. Upstream (kustomize-controller): Ensure HealthCheckFailed events are always emitted with error severity, not Progressing.
  2. Upstream (notification-controller): Document clearly which severity values are filtered pre-dispatch, and/or add a pass-through option on Providers so all events matching eventSources are forwarded regardless of internal severity classification.

Related issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions