Leverage StepContext to mitigate linearization timeout issue in Robustness test#21715
Leverage StepContext to mitigate linearization timeout issue in Robustness test#21715henrybear327 wants to merge 2 commits into
Conversation
… timeout
Output log:
```
➜ etcd git:(robustness/porcupine_timeout_demo) ✗ go test -test.fullpath=true -timeout 300s -v -run ^TestValidateLinearizableOperationsTimeoutIsRespected$ go.etcd.io/etcd/tests/v3/robustness/validate
=== RUN TestValidateLinearizableOperationsTimeoutIsRespected
2026/04/22 07:31:22 checkParallel started
2026/04/22 07:31:22 checkSingle started
2026/04/22 07:31:22 before step
2026/04/22 07:31:22 after step
2026/04/22 07:31:22 before step
2026/04/22 07:31:22 after step
2026/04/22 07:31:22 before step
2026/04/22 07:31:22 after step
2026/04/22 07:31:22 before step
2026/04/22 07:31:22 after step
2026/04/22 07:31:22 before step
2026/04/22 07:31:22 after step
2026/04/22 07:31:22 before step
2026/04/22 07:31:22 after step
2026/04/22 07:31:22 before step
2026/04/22 07:31:22 after step
2026/04/22 07:31:22 before step
2026/04/22 07:31:22 after step
2026/04/22 07:31:22 before step
2026/04/22 07:31:22 after step
2026/04/22 07:31:22 before step
2026/04/22 07:31:22 after step
2026/04/22 07:31:22 before step
2026/04/22 07:31:22 after step
2026/04/22 07:31:22 before step
2026/04/22 07:31:22 after step
2026/04/22 07:31:22 before step
2026/04/22 07:31:22 after step
2026/04/22 07:31:22 before step
2026/04/22 07:31:22 after step
2026/04/22 07:31:22 before step
2026/04/22 07:31:22 after step
2026/04/22 07:31:22 before step
2026/04/22 07:31:22 after step
2026/04/22 07:31:22 before step
2026/04/22 07:31:22 after step
2026/04/22 07:31:22 before step
2026/04/22 07:31:23 kill set to 1
2026/04/22 07:33:47 after step
2026/04/22 07:33:47 checkSingle ended
2026/04/22 07:33:47 checkParallel ended
/Users/henrybear327/go/src/etcd/tests/robustness/validate/operations_test.go:360: validateLinearizableOperationsAndVisualize(...) does not respect timeout: 2m24.622795042s, timeout was 1s
--- FAIL: TestValidateLinearizableOperationsTimeoutIsRespected (144.63s)
FAIL
FAIL go.etcd.io/etcd/tests/v3/robustness/validate 145.072s
FAIL
➜ etcd git:(robustness/porcupine_timeout_demo) ✗
```
Signed-off-by: Chun-Hung Tseng <henrytseng@google.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: henrybear327 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
…tness test Ref: anishathalye/porcupine#44 Signed-off-by: Chun-Hung Tseng <henrytseng@google.com>
4460005 to
4fbc45d
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted filessee 25 files with indirect coverage changes @@ Coverage Diff @@
## main #21715 +/- ##
==========================================
- Coverage 70.30% 70.26% -0.04%
==========================================
Files 425 425
Lines 35145 35145
==========================================
- Hits 24708 24696 -12
- Misses 9044 9051 +7
- Partials 1393 1398 +5 Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
|
@henrybear327: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
As discussed offline with Marek, this is a proper fix to the issue, but we need a solution in the meantime to fix the problem first. I will open up another PR to quickly address our current problem, while we wait for the discussion and release from the upstream :) |
We have been seeing quite some examples during the robustness test debugging that the linearization timeout isn't respected.
The root cause is that the porcupine
Stepinterface currently doesn't have a way to propagate timeout information to the interface implementation. Under certain condition as mentioned in anishathalye/porcupine#44, we can run into situations that the validation timeout is not respected, causing the robustness test CI to fail (timeout or OOM).This PR attempts to resolve this by switching to using the new
StepContextinterface.We can only merge this PR after anishathalye/porcupine#45 is merged. Currently, the PR is based on my commit for testing purposes.
Additional context: #21606