Add proposal for controller readiness probe based on KRaft raft state#225
Open
mly-zju wants to merge 1 commit into
Open
Add proposal for controller readiness probe based on KRaft raft state#225mly-zju wants to merge 1 commit into
mly-zju wants to merge 1 commit into
Conversation
Adds a new /v1/controller-ready endpoint to the Kafka Agent that reports whether a KRaft controller is attached to the metadata quorum, by reading the current-state attribute of the kafka.server:type=raft-metrics JMX MBean. The controller-only branch of kafka_readiness.sh is updated to call this endpoint so the existing readiness probe reflects raft attachment state rather than only whether the controller listener port is bound. Motivated by strimzi-kafka-operator#12760: a real-world cold-boot CoreDNS race that wedged a controller for 8 hours despite the pod remaining 1/1 Ready under the current netstat-based probe. A reference implementation is at strimzi-kafka-operator#12768 (closed pending this proposal). Signed-off-by: lingyangma <lingyang.ma@enterprisedb.com>
8 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Opening per maintainer request on strimzi-kafka-operator#12768.
This proposal adds a new
/v1/controller-readyendpoint to the Kafka Agent that reports whether a KRaft controller is attached to the metadata quorum, by reading thecurrent-stateattribute of thekafka.server:type=raft-metricsJMX MBean. The controller-only branch ofkafka_readiness.shis updated to call this endpoint so the existing readiness probe reflects raft attachment state rather than only whether the controller listener port is bound.Motivated by the cold-boot CoreDNS race documented in strimzi-kafka-operator#12760: a real production incident where a controller pod stayed
1/1 Readyfor 8 hours despite the metadata quorum being wedged, because the existingnetstat-based probe could not distinguish "listener bound" from "actually serving the quorum."A reference implementation that includes the agent endpoint, the script update, a unit-test matrix covering every raft state observed in Kafka 4.x, and end-to-end validation on both kind and AKS (including a
hostAliases-induced wedge that capturedHTTP 503 {"error":"controller not ready, current raft state: prospective"}from the new endpoint) is at strimzi-kafka-operator#12768 — closed pending acceptance of this proposal.I deliberately did not pre-assign a proposal number to the filename (matching the recent convention from PRs #220, #216, #211, #204, #203, #191, #159) or update the README index, since the number is typically finalized at merge. Happy to add both in a follow-up commit once a number is assigned.
One section worth particular attention is the Open question at the bottom: the proposal as written catches the broad class of "raft is not attached" failures, but the original #12760 incident's on-disk
quorum-state(leaderId: 3, leaderEpoch: 1persistent for 8 h) suggests raft did elect a leader and the wedge was one layer above, inControllerRegistrationManager. A second check covering that layer (e.g.last-applied-record-offsetadvance) is possible but introduces stateful checking and a tuning parameter — I'd appreciate steer on whether to include it here or treat as a follow-up.