Added proposal for auto-rebalance on imbalanced cluster feature in operator#211
Added proposal for auto-rebalance on imbalanced cluster feature in operator#211ShubhamRwt wants to merge 9 commits into
Conversation
Signed-off-by: ShubhamRwt <shubhamrwt02@gmail.com>
Signed-off-by: ShubhamRwt <shubhamrwt02@gmail.com>
Signed-off-by: ShubhamRwt <shubhamrwt02@gmail.com>
Signed-off-by: ShubhamRwt <shubhamrwt02@gmail.com>
|
@ShubhamRwt So, is this a draft? Or is it ready for review? It is not completely clear as the PR is not a Draft but you did not requested the review from all maintainers but only from two specific people. |
|
@scholzj It is ready for review, Sorry I didn't tagged everyone |
Signed-off-by: ShubhamRwt <shubhamrwt02@gmail.com>
scholzj
left a comment
There was a problem hiding this comment.
- It is not clear what the lifecycle of the KafkaRebalance resources is
- What is the impact of maintenance time windows?
- What is the testing strategy?
|
|
||
| #### What happens if some rebalance fails: | ||
|
|
||
| With the new `imbalance` mode, we will be introducing two new states to the FSM called `RebalanceOnImbalance` and `RebalanceOnImbalanceNotComplete` |
There was a problem hiding this comment.
Why do we need an additional state for failed rebalance? IIRC there is no such a specific state for failures with auto-rebalancing on scale up/down. Or am I wrong?
There was a problem hiding this comment.
I think it was me overthinking that in case some failure happens and it might need a human intervention and we might not need the auto rebalance to happen again and again but I think this is not really required. I will remove it
There was a problem hiding this comment.
So what state will it move to if there is an error? Idle?
There was a problem hiding this comment.
Will the generate KR CR move to NotReady (or whatever the fail state for that is)? How will a human operator know this happened?
There was a problem hiding this comment.
The generated KR Cr would move to NotReady, yes but I think the user can check the logs but I dont think there is any other thing
There was a problem hiding this comment.
Is the answer clear based on my previous reply?
Signed-off-by: ShubhamRwt <shubhamrwt02@gmail.com>
| To resolve this issue, we will only make use of Cruise Control's anomaly detection ability, the triggering of the partition reassignments (rebalance) will be the responsibility of the Strimzi Cluster Operator. | ||
| To enable this, we will use an approach based on the existing auto-rebalance for scaling feature (see the [documentation](https://strimzi.io/docs/operators/latest/deploying#proc-automating-rebalances-str) for more details). | ||
| We propose using only those anomaly detection classes, related to goal violations, that can be addressed by a partition rebalance. | ||
| We will not enable the other anomaly detection classes, related to goal violations, that would require manual interventions at the infrastructure level such as disk or broker failures. |
There was a problem hiding this comment.
I think it would be nice, if we make events or WARN logs in Strimzi if there are other type of anomalies around Kafka cluster, which cannot be automatically fixed. WDYT?
There was a problem hiding this comment.
AFAIK disk and broker failures would surface through the Strimzi Kafka CR status and normal metrics monitoring channels so I am not sure having CC also report that is particularly useful?
There was a problem hiding this comment.
If it is reported in "other channels" then it is fine to me!
There was a problem hiding this comment.
I guess that depends on what disk and broker failures really are. I do not think disk failures are necessarily something the operator would notice? Broker failures - if that means the broker is unresponsive - then it would actually fail the reconciliation I think. But if the broker accepts connections but doesn't for example replicate any data, we would not notice it.
There was a problem hiding this comment.
I think all those situations are better surfaced through monitoring the appropriate metrics, that is all CC is doing anyway.
| * `remove-brokers` - auto-rebalancing on scale down | ||
|
|
||
| To leverage the automated rebalance on imbalanced clusters (those with detected goal violations), we will be introducing a new mode to the auto-rebalancing feature. | ||
| The new mode will be called `imbalance`, which means that cluster imbalance was detected and rebalancing should be applied to all the brokers. |
There was a problem hiding this comment.
nit: I think we should call it rebalance or rebalance-broker instead of imbalance
I prefer rebalance or rebalance-broker, since it will match with CC terminology and also add-brokers, remove-brokers have a verb, while "imbalance" is just an adjective. So to match the current convention better.
There was a problem hiding this comment.
I disagree, to me, the mode is the reason for the auto-rebalance being triggered.
There was a problem hiding this comment.
In which case we might want to make it past-tense: "imbalanced"
There was a problem hiding this comment.
I disagree, to me, the mode is the reason for the auto-rebalance being triggered.
I also disagree with you, if it is the reason, then why we call it 'mode'? It should be called 'reason' then!
In this current form, for me, the rebalance-brokers would match the convention created by add-brokers and remove-brokers...
There was a problem hiding this comment.
But all the modes are rebalancing the brokers?
There was a problem hiding this comment.
The "mode" terminology is coming from the KafkaRebalance spec where you specify which mode you want to use for rebalancing: full, add-brokers, remove-brokers or remove-disks. All these modes maps to the corresponding CC endpoints. Maybe we had a discussion about a good naming long time ago about what to use in the auto-rebalancing and we ending with the same because the mapping was one to one. I can see @egyedt point where, within the auto-rebalancing, we are actually specifying "when I should trigger an auto-rebalancing?" on adding brokers, on removing brokers or ... whenever there is an anomaly. What the current proposal is going to do underneath in terms of KafkaRebalance is using the "full" mode which is using the /rebalance endpoint (full because using ALL brokers within the cluster). Maybe we can stick with "full" instead of "imbalance" for now? Or we should change the field name but it needs deprecating this and adding a new one.
There was a problem hiding this comment.
I still see this is an ongoing discussion - Do we plan to use imbalance or full mode?
There was a problem hiding this comment.
My take always was that it is the action we are going to take:
- Add brokers
- Remove brokers
Not the cause or event. For me, the imbalance is maybe not perfect (I think I originally preferred something else), but I think it is understandable for all users what it means and describes well the intent. full on the other hand is meaningless for me and makes it hard to imagine when it will be done.
While for the other two modes the type is the same as the mode in KafkaRebalance, for them it works fine to describe everything. full is fine as a mode in KafkaRebalance. It says what will happen. But in the Kafka CR autorebalance section, the keyword should focus on the intent and when it will happen. full does not explain that. imbalance does.
There was a problem hiding this comment.
Ok I am fine with sticking with imbalance.
| This could cause potential conflicts with other administration operations and is the primary reason self-healing has been disabled until now. | ||
| To resolve this issue, we will only make use of Cruise Control's anomaly detection ability, the triggering of the partition reassignments (rebalance) will be the responsibility of the Strimzi Cluster Operator. | ||
| To enable this, we will use an approach based on the existing auto-rebalance for scaling feature (see the [documentation](https://strimzi.io/docs/operators/latest/deploying#proc-automating-rebalances-str) for more details). | ||
| We propose using only those anomaly detection classes, related to goal violations, that can be addressed by a partition rebalance. |
There was a problem hiding this comment.
Maybe for some anomalies, the call of the fix-offline-replicas endpoint can be useful!
Beside the usage of rebalance endpoint.
There was a problem hiding this comment.
How would we decide which anomalies to call the different endpoint for?
There was a problem hiding this comment.
There should be some logic about this in Strimzi if we decide to use fix-offline-replicas endpoint too. Otherwise we can use only the rebalance endpoint, it is totally fine to me. I just mentioned here as an interesting idea.
There was a problem hiding this comment.
I didn't know about this /fix_offline_replicas endpoint at all. If there is a specific anomaly raised about it, maybe it could make sense to use it but it's not trivial. For how the rebalance operator works today, Each CC endpoint call is mappend on a corresponding "mode" within the KafkaRebalance, so hitting two endpoints would mean creating two KafkaRebalance: one using the /rebalance endpoint and another one using the /fix_offline_replicas (which is anyway not supported now).
tomncooper
left a comment
There was a problem hiding this comment.
I have had a pass. I left a few comments.
As other reviewers have identified, the main point is checking if a rebalance (manual or automatic) is ongoing before you do anything else.
For auto rebalances if seems you can check the state machine, for manual you could check CC directly or the status of the KR CR. If a manual KR CR already exists then something is going to check it's status with CC, depending on the ordering of those operations it may happen before you need to do your checks, in which case you could use that. If it is after or ordering can't be guaranteed then you might be best querying CC directly.
| This could cause potential conflicts with other administration operations and is the primary reason self-healing has been disabled until now. | ||
| To resolve this issue, we will only make use of Cruise Control's anomaly detection ability, the triggering of the partition reassignments (rebalance) will be the responsibility of the Strimzi Cluster Operator. | ||
| To enable this, we will use an approach based on the existing auto-rebalance for scaling feature (see the [documentation](https://strimzi.io/docs/operators/latest/deploying#proc-automating-rebalances-str) for more details). | ||
| We propose using only those anomaly detection classes, related to goal violations, that can be addressed by a partition rebalance. |
There was a problem hiding this comment.
How would we decide which anomalies to call the different endpoint for?
| To resolve this issue, we will only make use of Cruise Control's anomaly detection ability, the triggering of the partition reassignments (rebalance) will be the responsibility of the Strimzi Cluster Operator. | ||
| To enable this, we will use an approach based on the existing auto-rebalance for scaling feature (see the [documentation](https://strimzi.io/docs/operators/latest/deploying#proc-automating-rebalances-str) for more details). | ||
| We propose using only those anomaly detection classes, related to goal violations, that can be addressed by a partition rebalance. | ||
| We will not enable the other anomaly detection classes, related to goal violations, that would require manual interventions at the infrastructure level such as disk or broker failures. |
There was a problem hiding this comment.
AFAIK disk and broker failures would surface through the Strimzi Kafka CR status and normal metrics monitoring channels so I am not sure having CC also report that is particularly useful?
| * `remove-brokers` - auto-rebalancing on scale down | ||
|
|
||
| To leverage the automated rebalance on imbalanced clusters (those with detected goal violations), we will be introducing a new mode to the auto-rebalancing feature. | ||
| The new mode will be called `imbalance`, which means that cluster imbalance was detected and rebalancing should be applied to all the brokers. |
There was a problem hiding this comment.
I disagree, to me, the mode is the reason for the auto-rebalance being triggered.
|
|
||
| #### What happens if some rebalance fails: | ||
|
|
||
| With the new `imbalance` mode, we will be introducing two new states to the FSM called `RebalanceOnImbalance` and `RebalanceOnImbalanceNotComplete` |
There was a problem hiding this comment.
So what state will it move to if there is an error? Idle?
|
|
||
| #### What happens if some rebalance fails: | ||
|
|
||
| With the new `imbalance` mode, we will be introducing two new states to the FSM called `RebalanceOnImbalance` and `RebalanceOnImbalanceNotComplete` |
There was a problem hiding this comment.
Will the generate KR CR move to NotReady (or whatever the fail state for that is)? How will a human operator know this happened?
Signed-off-by: ShubhamRwt <shubhamrwt02@gmail.com>
Signed-off-by: ShubhamRwt <shubhamrwt02@gmail.com>
|
Is there an estimate when this PR will be ready? |
|
@egyedt Hi, I pushed the suggestions yesterday. I think this proposal is ready for next set of reviews. I hope the new suggestions fixes the open questions |
|
@scholzj @ppatierno @tomncooper can you guys have another pass at this, please. Thankyou |
Signed-off-by: ShubhamRwt <shubhamrwt02@gmail.com>
scholzj
left a comment
There was a problem hiding this comment.
I sitll don't see any considerations ont he maintenance time window which I think I raised before. It would be good to have it included in the proposal (whether it will follow the window, or if not, why not)
| ## Motivation | ||
|
|
||
| Currently, if the cluster is imbalanced, the user would need to manually rebalance the cluster by using the `KafkaRebalance` custom resource. | ||
| Every cluster requires manual intervention to trigger rebalancing when imbalances are detected, regardless of its size. |
There was a problem hiding this comment.
This sentence sounds strange. I would maybe remove it. There are many people who live without rebalancing all their life. Especially if you have small cluster with 3 brokers and use replication factor 3, there is actually not much to rebalance.
There was a problem hiding this comment.
I guess that Shubham meant to say something like "if your cluster is unbalanced and you want to do some rebalancing, the only way you have today is the manual one, there is no automation which is the goal for this proposal". I don't think he meant to say that all clusters of any size needs rebalancing. Maybe he can clarify the meaning.
There was a problem hiding this comment.
Yes, I assume so, but we should clarify it.
There was a problem hiding this comment.
I meant that with respect to Strimzi, yes, If we want to do any rebalance due to imbalance then we have to do it manually and there is now way for auto rebalance on imbalance mechanism.
| * `remove-brokers` - auto-rebalancing on scale down | ||
|
|
||
| To leverage the automated rebalance on imbalanced clusters (those with detected goal violations), we will be introducing a new mode to the auto-rebalancing feature. | ||
| The new mode will be called `imbalance`, which means that cluster imbalance was detected and rebalancing should be applied to all the brokers. |
There was a problem hiding this comment.
My take always was that it is the action we are going to take:
- Add brokers
- Remove brokers
Not the cause or event. For me, the imbalance is maybe not perfect (I think I originally preferred something else), but I think it is understandable for all users what it means and describes well the intent. full on the other hand is meaningless for me and makes it hard to imagine when it will be done.
While for the other two modes the type is the same as the mode in KafkaRebalance, for them it works fine to describe everything. full is fine as a mode in KafkaRebalance. It says what will happen. But in the Kafka CR autorebalance section, the keyword should focus on the intent and when it will happen. full does not explain that. imbalance does.
| 4. If the key is not present or empty, the operator uses the default anomaly detection goals defined in `CruiseControlConfiguration.CRUISE_CONTROL_DEFAULT_ANOMALY_DETECTION_GOALS_LIST`: | ||
| - `RACK_AWARENESS_GOAL` | ||
| - `MIN_TOPIC_LEADERS_PER_BROKER_GOAL` | ||
| - `REPLICA_CAPACITY_GOAL` | ||
| - `DISK_CAPACITY_GOAL` |
There was a problem hiding this comment.
So, are these goals what is in CRUISE_CONTROL_DEFAULT_ANOMALY_DETECTION_GOALS_LIST? It is a bit confusing what the list of goals means here.
There was a problem hiding this comment.
Yes, Shubham is listing what is in the CRUISE_CONTROL_DEFAULT_ANOMALY_DETECTION_GOALS_LIST. Would you suggest to remove the list and referencing to the constant definition only (within the code) or something else? Just trying to understand what "confusing" means here.
There was a problem hiding this comment.
I would probably just reference the list. Or one can do something like to make it more clear...
4. If the key is not present or empty, the operator uses the default anomaly detection goals defined in `CruiseControlConfiguration.CRUISE_CONTROL_DEFAULT_ANOMALY_DETECTION_GOALS_LIST`.
Currently, the list contains:
- ...
There was a problem hiding this comment.
I will try to make it simpler if it sounds confusing
| The `KafkaRebalance` has 4 modes: `full`, `add-broker`, `remove-broker` and `remove-disks` mode. | ||
| The `imbalance` mode will be mapped to the `full` mode in the generated KafkaRebalance resource which means that the generated `KafkaRebalance` custom resource will have the mode set as `full` which within the Strimzi rebalancing operator means calling the Cruise Control API to run a rebalancing taking all brokers into account. | ||
|
|
||
| The generated `KafkaRebalance` custom resource will be called `<my-cluster-name>-auto-rebalancing-goal-violation`, where the `<my-cluster-name>` part comes from the `metadata.name` in the `Kafka` custom resource, `goal-violation` means that a goal related imbalance was detected in the cluster. |
There was a problem hiding this comment.
Should the mode name be in the name? E.g. <my-cluster-name>-auto-rebalancing-imbalance? That will make it easier to understand for the user.
There was a problem hiding this comment.
I named it <my-cluster-name>-auto-rebalancing-goal-violation since we only support goal violations for now. I think if we think about future like have fix for topic anomalies etc then it would be great to have distinct names based on the anomaly type.
There was a problem hiding this comment.
Isn't imbalance the distinct type you can name it after?
There was a problem hiding this comment.
It sound good too but I wanted to name it as per the anomaly. (since we are handling goal violations so <my-cluster-name>-auto-rebalancing-goal-violation but I am open to change the names.
There was a problem hiding this comment.
@ppatierno @tomncooper any preference from your side?
| # ... other rebalancing related configuration | ||
| ``` | ||
|
|
||
| The operator also sets a finalizer, named `strimzi.io/auto-rebalancing`, on the generated `KafkaRebalance` custom resource. |
There was a problem hiding this comment.
Do we use this also for adding and removing brokers? Worth mentioning here whether it is the same pattern or not.
There was a problem hiding this comment.
Yes, thats the same. I will add a sentence to it
| When the `KafkaRebalanceAssemblyOperator` reconciles a KafkaRebalance resource and detects a transition to a terminal state: | ||
| 1. Extracts the cluster name from the KafkaRebalance labels | ||
| 2. Retrieves the corresponding Kafka CR | ||
| 3. Updates `Kafka.status.lastRebalanceCompletionTime` to the current timestamp | ||
| 4. Patches the Kafka CR status |
There was a problem hiding this comment.
No, this would not work and cause problems with the operators writing over each other and causing conflicts. You should follow the principle to write from one place and read from the other.
There was a problem hiding this comment.
I agree with Jakub: the KafkaAssemblyOperator is the owner of Kafka CR and it should be the only one to update the Kafka CR status (note: the KafkaAutoRebalanceReconciler can do that because it's part of such operator and not an operator itself). On the other side the KafkaRebalanceAssemblyOperator owns KafkaRebalance and it can't write into Kafka CR. I can't see an easy solution tough.
The KafkaAssemblyOperator having a watch on KafkaRebalance resources to detect when one ends in a terminal state to update the last rebalance time? :-/ Doing the same but poll-based on each reconciliation? :-/ (but we could lose a KafkaRebalance that was deleted by the user right after the rebalancing operation).
There was a problem hiding this comment.
The obvious option is having some Config Map 🤷. I also wonder if it would make long term make sense to drop the `KafkaRebalanceAssemblyOperator and integrate the logic here. But I'm not sure I fully understand all the possible consequences. Mainly whether the fact that all rebalances would be done only if we get to the end of the reconciliation could cause any issues.
| **Important Edge Case:** If a goal violation contains BOTH fixable goals and unfixable goals (i.e., both `fixableViolatedGoals` and `unfixableViolatedGoals` lists are non-empty), the operator will NOT trigger a rebalance. This follows Cruise Control's design principle that unfixable goals typically indicate infrastructure constraints (e.g., insufficient racks for rack-awareness, insufficient brokers for replica capacity) that cannot be resolved through partition reassignment alone. Attempting to rebalance only the fixable goals while unfixable goals remain would result in an incomplete fix and could lead to repeated rebalance attempts. | ||
|
|
||
| The operator can retrieve the unfixable goals from the `unfixableViolatedGoals` list in the JSON response. | ||
| When unfixable goals are detected, the operator will set a warning condition in the status of the `Kafka` CR to alert users that manual intervention is required: |
There was a problem hiding this comment.
So, will the unfixable goals persist in Cruise Control? Or will the warning disappear right in the next reconciliations?
There was a problem hiding this comment.
The later one is true but since the anomalies are unfixabale we can expect the error to pop up again once CC detects it again
There was a problem hiding this comment.
So, if the warning is not stable and we cannot guarantee that we can maintain it in every reconciliation because CC raises it right away, you should consider just logging it instead of using the conditions.
| conditions: | ||
| - type: Warning | ||
| status: "True" | ||
| reason: UnfixableViolatedGoal |
There was a problem hiding this comment.
This should probably have some better name to properly relate to the feature it is about.
| - **Frequent rebalances**: More than 3 auto-rebalances triggered within 1 hour may indicate a persisting anomaly | ||
| - **Repeated goal violations**: Same goal violation detected multiple times in a short time window (e.g., >2 times in 30 minutes) | ||
| - **Rebalance failures**: Unfixable goals detected or repeated NotReady states requiring manual intervention | ||
|
|
There was a problem hiding this comment.
Will we add these to the alerts we ship?
There was a problem hiding this comment.
I think it would be better to let users to create their own alerts as they can then choose how and when they want to trigger the alerts as per their cluster but we can definitely ship some example for the same
| ## Future Scope | ||
| In the future, we plan to introduce auto-rebalance for topic-related and metric-related imbalances. |
There was a problem hiding this comment.
| ## Future Scope | |
| In the future, we plan to introduce auto-rebalance for topic-related and metric-related imbalances. | |
| ## Future Scope | |
| In the future, we plan to introduce auto-rebalance for topic-related and metric-related imbalances. |
|
@scholzj thanks for the reviews, I will answer them in a while. I was just thinking about maintenance window. I don't think we really need a maintenance window. I would assume that we would like to have it such that if the users can avoid self healing for some time whey they are going some ugrades/ repairs to their cluster. But I think if they want to avoid it for some time, they can just disable the feature and enable it back once they think they are good to go. |
@ShubhamRwt I personally think hooking it into maintenance time window makes sense. It is not about upgrades - you would do those IN the maintenance window in the first place. It is not about repairs - you would do those either in the maintenance window or in an emergency. It is about for example about reducing disruption caused by the rebalancing in peak hours of your business. That is where you do not want to do cert renewals. And where you possibly also do not want to do rebalances. Sure, you can manually switch it off. But I do not think it works here. If you tell someone to "switch off" the auto-rebalancing every morning and switch it on every evening when going home they will never use it. In any case, one way or another, this should be part of the proposal. |
| ## Motivation | ||
|
|
||
| Currently, if the cluster is imbalanced, the user would need to manually rebalance the cluster by using the `KafkaRebalance` custom resource. | ||
| Every cluster requires manual intervention to trigger rebalancing when imbalances are detected, regardless of its size. |
There was a problem hiding this comment.
I guess that Shubham meant to say something like "if your cluster is unbalanced and you want to do some rebalancing, the only way you have today is the manual one, there is no automation which is the goal for this proposal". I don't think he meant to say that all clusters of any size needs rebalancing. Maybe he can clarify the meaning.
| * `remove-brokers` - auto-rebalancing on scale down | ||
|
|
||
| To leverage the automated rebalance on imbalanced clusters (those with detected goal violations), we will be introducing a new mode to the auto-rebalancing feature. | ||
| The new mode will be called `imbalance`, which means that cluster imbalance was detected and rebalancing should be applied to all the brokers. |
There was a problem hiding this comment.
Ok I am fine with sticking with imbalance.
| 4. If the key is not present or empty, the operator uses the default anomaly detection goals defined in `CruiseControlConfiguration.CRUISE_CONTROL_DEFAULT_ANOMALY_DETECTION_GOALS_LIST`: | ||
| - `RACK_AWARENESS_GOAL` | ||
| - `MIN_TOPIC_LEADERS_PER_BROKER_GOAL` | ||
| - `REPLICA_CAPACITY_GOAL` | ||
| - `DISK_CAPACITY_GOAL` |
There was a problem hiding this comment.
Yes, Shubham is listing what is in the CRUISE_CONTROL_DEFAULT_ANOMALY_DETECTION_GOALS_LIST. Would you suggest to remove the list and referencing to the constant definition only (within the code) or something else? Just trying to understand what "confusing" means here.
| - Users must correct the template configuration to resume imbalance auto-rebalancing | ||
|
|
||
| The `KafkaRebalance` has 4 modes: `full`, `add-broker`, `remove-broker` and `remove-disks` mode. | ||
| The `imbalance` mode will be mapped to the `full` mode in the generated KafkaRebalance resource which means that the generated `KafkaRebalance` custom resource will have the mode set as `full` which within the Strimzi rebalancing operator means calling the Cruise Control API to run a rebalancing taking all brokers into account. |
There was a problem hiding this comment.
| The `imbalance` mode will be mapped to the `full` mode in the generated KafkaRebalance resource which means that the generated `KafkaRebalance` custom resource will have the mode set as `full` which within the Strimzi rebalancing operator means calling the Cruise Control API to run a rebalancing taking all brokers into account. | |
| The `imbalance` mode will be mapped to the `full` mode in the generated `KafkaRebalance` resource which means that the generated `KafkaRebalance` custom resource will have the mode set as `full` which within the Strimzi rebalancing operator means calling the Cruise Control API to run a rebalancing taking all brokers into account. |
| # ... other rebalancing related configuration | ||
| ``` | ||
|
|
||
| The operator also sets a finalizer, named `strimzi.io/auto-rebalancing`, on the generated `KafkaRebalance` custom resource. |
| Talking about the priority of rebalance operations, scale down rebalance operations have the highest priority, followed by scale up, and then rebalance on imbalance. | ||
| The priority ordering is designed to prevent wasted work and optimize resource usage: | ||
| - **Scale-down has highest priority**: Moving partition replicas to brokers that will soon be removed wastes resources and time, so scale-down operations must complete first | ||
| - **Scale-up has second priority**: Quickly utilizing new capacity is important for cluster stability | ||
| - **Imbalance rebalancing has lowest priority**: It addresses steady-state optimization rather than cluster topology changes, so it can wait for scaling operations to complete |
There was a problem hiding this comment.
No there is no such a queue, I think this is trying to summarize what's going to happen in the corresponding FSM (i.e. a anomaly is skipped if there is a rebalance on scale-down, ...). I would agree it could sound like a queue handling with priorities. @ShubhamRwt because we have the FSM explained later on, I guess you could even remove this list? Wdyt @scholzj ?
| - **Manual rebalances**: Created by users with arbitrary names | ||
|
|
||
| The implementation leverages the existing `KafkaRebalanceUtils` class which is already used throughout the Strimzi codebase for state extraction and validation. | ||
| All KafkaRebalance resources (manual and auto-generated) have the `strimzi.io/cluster` label, allowing the operator to discover them via label selector. |
There was a problem hiding this comment.
| All KafkaRebalance resources (manual and auto-generated) have the `strimzi.io/cluster` label, allowing the operator to discover them via label selector. | |
| All `KafkaRebalance` resources (manual and auto-generated) have the `strimzi.io/cluster` label, allowing the operator to discover them via label selector. |
|
|
||
| **Tracking Rebalance Completion Times:** | ||
|
|
||
| Since auto-generated KafkaRebalance resources are deleted upon completion or failure, and manual KafkaRebalance resources can be deleted by users, we need a persistent location to track completion times. |
There was a problem hiding this comment.
| Since auto-generated KafkaRebalance resources are deleted upon completion or failure, and manual KafkaRebalance resources can be deleted by users, we need a persistent location to track completion times. | |
| Since auto-generated `KafkaRebalance` resources are deleted upon completion or failure, and manual `KafkaRebalance` resources can be deleted by users, we need a persistent location to track completion times. |
| **State Machine Handling:** | ||
|
|
||
| Both inter-broker and intra-broker rebalancing use the same `RebalanceOnImbalance` state. | ||
| The operator tracks which type of rebalance is currently running and executes them sequentially within this state: |
There was a problem hiding this comment.
I see a problem here. The states within an FSM are persisted into the Kafka CR so that if the cluster operator crashes it can recover from it was before. You are adding a new (internal) "state" for the RebalanceOnImbalance .... state :-) which makes a differentiation between rebalanceDisk true or false. It means that when the operator recover from a crash and it recovers the state RebalanceOnImbalance, it doesn't know if the rebalanceDisk it was handling was true or false. How are you going to solve this?
There was a problem hiding this comment.
I understand your concern here. You are right. I think we can either think of having two separate state RebalanceOnImbalanceInterBrokerBalancing and RebalanceOnImbalanceIntraBrokerBalancing or the other way can be introduction of new field called pendingOperation or phase to which will track the state of the RebalanceOnImbalance mode.
There was a problem hiding this comment.
Not sure I would make the FSM more complicated by adding an additional state tbh. Maybe just storing the rebalanceDisk field would be enough. Let's see if others have a better idea.
| When the `KafkaRebalanceAssemblyOperator` reconciles a KafkaRebalance resource and detects a transition to a terminal state: | ||
| 1. Extracts the cluster name from the KafkaRebalance labels | ||
| 2. Retrieves the corresponding Kafka CR | ||
| 3. Updates `Kafka.status.lastRebalanceCompletionTime` to the current timestamp | ||
| 4. Patches the Kafka CR status |
There was a problem hiding this comment.
I agree with Jakub: the KafkaAssemblyOperator is the owner of Kafka CR and it should be the only one to update the Kafka CR status (note: the KafkaAutoRebalanceReconciler can do that because it's part of such operator and not an operator itself). On the other side the KafkaRebalanceAssemblyOperator owns KafkaRebalance and it can't write into Kafka CR. I can't see an easy solution tough.
The KafkaAssemblyOperator having a watch on KafkaRebalance resources to detect when one ends in a terminal state to update the last rebalance time? :-/ Doing the same but poll-based on each reconciliation? :-/ (but we could lose a KafkaRebalance that was deleted by the user right after the rebalancing operation).
This PR aims to introduce the self-healing feature in Strimzi. This proposal contains all the comments and suggestion left on the old proposal . This proposal aim to utilize the auto-rebalancing feature of Strimzi to introduce the self healing.