Skip to content

Add status reporting to EtcdCluster#350

Merged
ahrtr merged 1 commit into
etcd-io:mainfrom
GunaKKIBM:Status-report
Jun 14, 2026
Merged

Add status reporting to EtcdCluster#350
ahrtr merged 1 commit into
etcd-io:mainfrom
GunaKKIBM:Status-report

Conversation

@GunaKKIBM

@GunaKKIBM GunaKKIBM commented May 20, 2026

Copy link
Copy Markdown
Contributor

@k8s-ci-robot

Copy link
Copy Markdown

Hi @GunaKKIBM. Thanks for your PR.

I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ahrtr

ahrtr commented May 26, 2026

Copy link
Copy Markdown
Member

/ok-to-test

@ahrtr

ahrtr commented May 26, 2026

Copy link
Copy Markdown
Member

please resolve the workflow failures.

@GunaKKIBM GunaKKIBM force-pushed the Status-report branch 2 times, most recently from c98b771 to 6654687 Compare May 27, 2026 04:01
}

return r.reconcileClusterState(ctx, state)
result, err := r.reconcileClusterState(ctx, state)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use a deferred function at line 89, so that we don't need to call r.updateStatus in multiple places.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. I actually moved it to the beginning of the function, because it should ideally be called even after fetchAndValidate function exits.

Comment on lines +490 to +491
quorum := (len(s.memberListResp.Members) / 2) + 1
if healthyCount >= quorum {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to do the math ourselves. We just need to issue an lineralibity read, the etcd cluster is still functional as long as we get a response from any member.

We need enhance the health check.

Let's just keep it as it's for now, and consider improvement in followup PRs, thx

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#357 - created this issue to take this up in a separate PR

Comment on lines +562 to +565
// Update or append conditions
meta.SetStatusCondition(&s.cluster.Status.Conditions, availableCondition)
meta.SetStatusCondition(&s.cluster.Status.Conditions, progressingCondition)
meta.SetStatusCondition(&s.cluster.Status.Conditions, degradedCondition)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

questions: do it mean that we always have three conditions?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This works like a deployment or any other k8s controller's status. For example, here's two examples of a deployment, one successful and one failed

Successful

  status:
    availableReplicas: 2
    conditions:
    - lastTransitionTime: "2026-06-06T06:28:43Z"
      lastUpdateTime: "2026-06-06T06:28:43Z"
      message: Deployment has minimum availability.
      reason: MinimumReplicasAvailable
      status: "True"
      type: Available
    - lastTransitionTime: "2026-06-06T06:28:31Z"
      lastUpdateTime: "2026-06-06T06:28:43Z"
      message: ReplicaSet "nginx-deployment-6f9664446b" has successfully progressed.
      reason: NewReplicaSetAvailable
      status: "True"
      type: Progressing
    observedGeneration: 1
    readyReplicas: 2
    replicas: 2
    updatedReplicas: 2

Failed:

status:
  conditions:
  - lastTransitionTime: "2026-06-06T06:30:09Z"
    lastUpdateTime: "2026-06-06T06:30:09Z"
    message: Deployment does not have minimum availability.
    reason: MinimumReplicasUnavailable
    status: "False"
    type: Available
  - lastTransitionTime: "2026-06-06T06:40:10Z"
    lastUpdateTime: "2026-06-06T06:40:10Z"
    message: ReplicaSet "failing-nginx-deployment-8644746d4f" has timed out progressing.
    reason: ProgressDeadlineExceeded
    status: "False"
    type: Progressing
  observedGeneration: 1
  replicas: 2
  unavailableReplicas: 2
  updatedReplicas: 2

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the info.

It seems there is no Degraded condition?

@GunaKKIBM GunaKKIBM Jun 6, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is upto us to add different types of conditions, the type in the condition is of type string. I added one more and called it Degraded (if one or more members is not healthy). We can rename it.

I can check other K8S resources or code to see what are other preferable naming conventions

Thank you!

@ivanvc ivanvc left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a minor suggestion regarding the message labels.

/cc @jberkus for an agreement on how to display them, as he has more documentation experience.

Comment thread internal/controller/etcdcluster_controller.go
Comment thread internal/controller/etcdcluster_controller.go
Comment thread internal/controller/etcdcluster_controller.go
Comment thread internal/controller/etcdcluster_controller.go
Comment thread internal/controller/etcdcluster_controller.go
Comment thread internal/controller/etcdcluster_controller.go
@ahrtr

ahrtr commented Jun 3, 2026

Copy link
Copy Markdown
Member

@GunaKKIBM pls resolve the minor changes so that we can merge this PR. thx

@GunaKKIBM

Copy link
Copy Markdown
Contributor Author

@ahrtr . I will be updating this by tomorrow, along with test results for sure. Sorry about the delay.
Thanks!

@ahrtr

ahrtr commented Jun 3, 2026

Copy link
Copy Markdown
Member

pls squash the commits, thx

@jberkus

jberkus commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

I left a minor suggestion regarding the message labels.

/cc @jberkus for an agreement on how to display them, as he has more documentation experience.

Seems fine either way from a docs perspective. What's there now is OK.

@GunaKKIBM

Copy link
Copy Markdown
Contributor Author

Here's how the status looks like when the etcd cluster is deployed

status:
  conditions:
  - lastTransitionTime: "2026-06-06T07:06:52Z"
    message: Etcd cluster has 3/3 healthy members with quorum
    observedGeneration: 1
    reason: ClusterAvailable
    status: "True"
    type: Available
  - lastTransitionTime: "2026-06-06T07:06:52Z"
    message: Etcd cluster is stable
    observedGeneration: 1
    reason: ClusterStable
    status: "False"
    type: Progressing
  - lastTransitionTime: "2026-06-06T07:06:37Z"
    message: All etcd members are healthy
    observedGeneration: 1
    reason: ClusterHealthy
    status: "False"
    type: Degraded
  currentReplicas: 3
  currentVersion: 3.5.21
  leaderID: 41f5beba5f42189a
  memberCount: 3
  members:
  - id: 41f5beba5f42189a
    isHealthy: true
    isLeader: true
    name: etcdcluster-sample-0
    version: 3.5.21
  - id: a2ec1c51b36711d8
    isHealthy: true
    name: etcdcluster-sample-1
    version: 3.5.21
  - id: b8bce8ddff0f6f4f
    isHealthy: true
    name: etcdcluster-sample-2
    version: 3.5.21
  observedGeneration: 1
  readyReplicas: 3

@GunaKKIBM

Copy link
Copy Markdown
Contributor Author

During further testing, I see an issue, rather than fixing it here. I have created a bug
#358, will pick it up in a different PR

@GunaKKIBM

Copy link
Copy Markdown
Contributor Author

@ahrtr @ivanvc . Please take a look now. Thank you!

@ahrtr ahrtr left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

cc @ivanvc do you have any further comment on this PR?

}
}()

state, res, err := r.fetchAndValidateState(ctx, req)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The state here is a new variable. It won't update the state defined above (line 85)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great catch. Thank you very much. I have declared the variables separately and tested these changes once again.
Please let me know if that looks good. Thanks!

@ivanvc ivanvc self-requested a review June 9, 2026 18:16
@GunaKKIBM

Copy link
Copy Markdown
Contributor Author

@ahrtr
@ivanvc
May I know if this PR is good to be merged?

@ahrtr

ahrtr commented Jun 10, 2026

Copy link
Copy Markdown
Member

@ahrtr @ivanvc May I know if this PR is good to be merged?

It should be good to merge. @ivanvc will take a second look before we merge it.

@ivanvc ivanvc left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Ideally, the dependency bump should be in a different commit, unless it's absolutely required by the code change. Otherwise, it looks good and isn't a blocker. Thanks, @GunaKKIBM

@k8s-ci-robot

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahrtr, GunaKKIBM, ivanvc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ivanvc

ivanvc commented Jun 10, 2026

Copy link
Copy Markdown
Member

@GunaKKIBM, please resolve the conflicts with go.mod/go.sum. After that, we should merge. Thanks :)

Signed-off-by: Guna K Kambalimath <Guna.Kambalimath@ibm.com>
@GunaKKIBM

Copy link
Copy Markdown
Contributor Author

@ahrtr @ivanvc
I have updated the PR. Rebased with main branch. please merge! Thank you!

@ahrtr ahrtr merged commit ddb834e into etcd-io:main Jun 14, 2026
5 checks passed
valen-mascarenhas14 pushed a commit to valen-mascarenhas14/etcd-operator that referenced this pull request Jun 15, 2026
Signed-off-by: Guna K Kambalimath <Guna.Kambalimath@ibm.com>
Signed-off-by: Valen Mascarenhas <valen.mascarenhas@ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants