Skip to content

Add proposal for controller readiness probe based on KRaft raft state#225

Open
mly-zju wants to merge 1 commit into
strimzi:mainfrom
mly-zju:controller-readiness-jmx-raft-state
Open

Add proposal for controller readiness probe based on KRaft raft state#225
mly-zju wants to merge 1 commit into
strimzi:mainfrom
mly-zju:controller-readiness-jmx-raft-state

Conversation

@mly-zju
Copy link
Copy Markdown

@mly-zju mly-zju commented May 28, 2026

Opening per maintainer request on strimzi-kafka-operator#12768.

This proposal adds a new /v1/controller-ready endpoint to the Kafka Agent that reports whether a KRaft controller is attached to the metadata quorum, by reading the current-state attribute of the kafka.server:type=raft-metrics JMX MBean. The controller-only branch of kafka_readiness.sh is updated to call this endpoint so the existing readiness probe reflects raft attachment state rather than only whether the controller listener port is bound.

Motivated by the cold-boot CoreDNS race documented in strimzi-kafka-operator#12760: a real production incident where a controller pod stayed 1/1 Ready for 8 hours despite the metadata quorum being wedged, because the existing netstat-based probe could not distinguish "listener bound" from "actually serving the quorum."

A reference implementation that includes the agent endpoint, the script update, a unit-test matrix covering every raft state observed in Kafka 4.x, and end-to-end validation on both kind and AKS (including a hostAliases-induced wedge that captured HTTP 503 {"error":"controller not ready, current raft state: prospective"} from the new endpoint) is at strimzi-kafka-operator#12768 — closed pending acceptance of this proposal.

I deliberately did not pre-assign a proposal number to the filename (matching the recent convention from PRs #220, #216, #211, #204, #203, #191, #159) or update the README index, since the number is typically finalized at merge. Happy to add both in a follow-up commit once a number is assigned.

One section worth particular attention is the Open question at the bottom: the proposal as written catches the broad class of "raft is not attached" failures, but the original #12760 incident's on-disk quorum-state (leaderId: 3, leaderEpoch: 1 persistent for 8 h) suggests raft did elect a leader and the wedge was one layer above, in ControllerRegistrationManager. A second check covering that layer (e.g. last-applied-record-offset advance) is possible but introduces stateful checking and a tuning parameter — I'd appreciate steer on whether to include it here or treat as a follow-up.

Adds a new /v1/controller-ready endpoint to the Kafka Agent that reports
whether a KRaft controller is attached to the metadata quorum, by reading
the current-state attribute of the kafka.server:type=raft-metrics JMX
MBean. The controller-only branch of kafka_readiness.sh is updated to
call this endpoint so the existing readiness probe reflects raft
attachment state rather than only whether the controller listener port
is bound.

Motivated by strimzi-kafka-operator#12760: a real-world cold-boot
CoreDNS race that wedged a controller for 8 hours despite the pod
remaining 1/1 Ready under the current netstat-based probe. A reference
implementation is at strimzi-kafka-operator#12768 (closed pending this
proposal).

Signed-off-by: lingyangma <lingyang.ma@enterprisedb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant