test(e2e): add scaling & KV functionality tests; extract shared helpers#192
Conversation
|
Hi @ballista01. Thanks for your PR. I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/ok-to-test |
| command := []string{ | ||
| "etcdctl", "member", "add", "etcd-promote-learner-2", | ||
| "--peer-urls=http://etcd-promote-learner-2.etcd-promote-learner.etcd-operator-system.svc.cluster.local:2380", | ||
| "--learner", | ||
| } |
There was a problem hiding this comment.
If TLS is enabled, the command will fail. More importantly, why adding member directly? It should be the responsibility of the controller.
There was a problem hiding this comment.
I agree, maybe we should keep the test case in unit test rather than make it E2E.
Removed.
|
also cc @frederiko |
dc43cbe to
136dcac
Compare
ivanvc
left a comment
There was a problem hiding this comment.
Thanks for the pull request, @ballista01! I left a couple of comments :)
| } | ||
| } | ||
|
|
||
| // TODO: TestPodRecovery is commented out - would like maintainer feedback on expected behavior |
There was a problem hiding this comment.
I think this is worth discussing in a different issue.
In what scenarios would the pod get deleted? The scheduler shouldn't delete the pod, unless I'm missing/misunderstanding something.
There was a problem hiding this comment.
Thanks for the review! I've created an issue at #193 for discussion.
There was a problem hiding this comment.
Quick update: This only repros when the cluster is created without persistent storage, or when a Pod and its PVC are removed together (effective permanent loss). With PVC-backed clusters (typical in production), deleting a Pod behaves as expected: the StatefulSet recreates the same ordinal with its PVC and the member rejoins; no downscale.
If you prefer, I’ll close #193 as not reproducible for the transient case and send a small docs PR clarifying the above.
|
|
||
| // Verify controller promoted all learners to voting members | ||
| for _, line := range memberLines { | ||
| if strings.Contains(line, "isLearner=true") { |
There was a problem hiding this comment.
Have you manually verified the output of command etcdctl member list?
There was a problem hiding this comment.
Switched to parsing etcdctl JSON with etcd’s official types instead of string matching.
89c35e8 to
82bf3bc
Compare
| HashKV etcdserverpb.HashKVResponse `json:"HashKV"` | ||
| } | ||
| if err := json.Unmarshal([]byte(stdout), &entries); err != nil { | ||
| t.Fatalf("Failed to parse endpoint hashkv JSON: %v. Raw: %s", err, truncate(stdout, 512)) |
There was a problem hiding this comment.
print the whole response? It won't too long.
|
This PR high level looks good to me. @frederiko @ivanvc do you have comment on the implementation? PTAL, thx |
ee738c7 to
0e5b948
Compare
|
Please rebase & squash the commit, thx |
on it |
0e5b948 to
d2c2118
Compare
36f5692 to
0288e9f
Compare
- add scenarios for invalid sizes, controller health, scaling, pod recovery, and cluster-wide data consistency - add helpers to create PVC-backed clusters, wait for STS readiness, inspect member lists, and verify PVC usage Signed-off-by: Wenxue Zhao <ballista01@outlook.com>
0288e9f to
0a7eb8e
Compare
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ahrtr, ballista01, ivanvc The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Summary
This PR extends the E2E suite under
test/e2e/with realistic cluster-lifecycle scenarios, PVC-backed clusters, and shared helpers to reduce duplication and flakiness. It adds scaling, pod recovery with persistent storage, and KV/replication checks usingetcdctlfrom inside cluster pods.No controller or production code changes are included.
What’s in this PR
test/e2e/e2e_test.goKeeps existing
TestInvalidClusterSize,TestClusterHealthy.New:
TestScaling(table-driven:ScaleInFrom3To1,ScaleOutFrom1To3)New:
TestPodRecoveryputon one member andgetfrom the recovered member.Updated:
TestEtcdClusterFunctionalityput/get/del; cross-member read for replication; and ensures no learners in a healthy cluster.test/e2e/helpers_test.go(new)Cluster + storage
createEtcdClusterWithPVC— creates anEtcdClusterwithStorageSpec(64Mi) and PVC.getAvailableStorageClass— resolves StorageClass viaETCD_E2E_STORAGECLASSenv or default SC; skips if none suitable.Lifecycle / waiting
waitForSTSReadiness— waits for StatefulSetReadyReplicasusing e2e-frameworkwait(tuned timeouts/intervals).scaleEtcdCluster— patchesspec.sizeand waits for STS readiness.cleanupEtcdCluster— best-effort deletion.Introspection
execInPod— runsetcdctlinside pods and returns stdout/stderr.getEtcdMembers— parsesetcdctl member list -w jsoninto typed structs to build name → memberID map (more robust than string parsing).verifyPodUsesPVC— asserts a pod mounts the expected PVC (by prefix).Types:
EtcdMember,EtcdMemberList.Motivation
Address prior review feedback to split the large refactor into focused pieces.
Add coverage for scenarios that require a real cluster (DNS/networking/storage) and are not reliable at unit-test level:
spec.size, StatefulSet replicas, and etcd member list in sync; ensures learners are promoted.etcdctlhealth + KV + cross-member reads to prove replication.How it validates behavior
etcdctlinside pods (execInPod) to observe internal cluster state (health, member list, KV).getEtcdMembers) to avoid brittle string checks.waitForSTSReadinessand targetedwait.Forloops reduce flakes by waiting for ReadyReplicas and pod recreation.Prior work
Huge thanks to #51 by @abdurrehman107! It inspired the feature-based layout with sigs.k8s.io/e2e-framework, the Setup - Assess - Teardown flow and the use of wait.For + conditions.ResourceScaled for readiness. Appreciate the groundwork!
Notes for reviewers
ETCD_E2E_STORAGECLASS=<your-sc>before running E2E.