Skip to content

Leverage StepContext to mitigate linearization timeout issue in Robustness test#21715

Draft
henrybear327 wants to merge 2 commits into
etcd-io:mainfrom
henrybear327:fix/checkSingle_timeout
Draft

Leverage StepContext to mitigate linearization timeout issue in Robustness test#21715
henrybear327 wants to merge 2 commits into
etcd-io:mainfrom
henrybear327:fix/checkSingle_timeout

Conversation

@henrybear327
Copy link
Copy Markdown
Contributor

@henrybear327 henrybear327 commented May 6, 2026

We have been seeing quite some examples during the robustness test debugging that the linearization timeout isn't respected.

The root cause is that the porcupine Step interface currently doesn't have a way to propagate timeout information to the interface implementation. Under certain condition as mentioned in anishathalye/porcupine#44, we can run into situations that the validation timeout is not respected, causing the robustness test CI to fail (timeout or OOM).

This PR attempts to resolve this by switching to using the new StepContext interface.

We can only merge this PR after anishathalye/porcupine#45 is merged. Currently, the PR is based on my commit for testing purposes.

Additional context: #21606

… timeout

Output log:
```
➜  etcd git:(robustness/porcupine_timeout_demo) ✗ go test -test.fullpath=true -timeout 300s -v -run ^TestValidateLinearizableOperationsTimeoutIsRespected$ go.etcd.io/etcd/tests/v3/robustness/validate
=== RUN   TestValidateLinearizableOperationsTimeoutIsRespected
2026/04/22 07:31:22 checkParallel started
2026/04/22 07:31:22 checkSingle started
2026/04/22 07:31:22 before step
2026/04/22 07:31:22 after step
2026/04/22 07:31:22 before step
2026/04/22 07:31:22 after step
2026/04/22 07:31:22 before step
2026/04/22 07:31:22 after step
2026/04/22 07:31:22 before step
2026/04/22 07:31:22 after step
2026/04/22 07:31:22 before step
2026/04/22 07:31:22 after step
2026/04/22 07:31:22 before step
2026/04/22 07:31:22 after step
2026/04/22 07:31:22 before step
2026/04/22 07:31:22 after step
2026/04/22 07:31:22 before step
2026/04/22 07:31:22 after step
2026/04/22 07:31:22 before step
2026/04/22 07:31:22 after step
2026/04/22 07:31:22 before step
2026/04/22 07:31:22 after step
2026/04/22 07:31:22 before step
2026/04/22 07:31:22 after step
2026/04/22 07:31:22 before step
2026/04/22 07:31:22 after step
2026/04/22 07:31:22 before step
2026/04/22 07:31:22 after step
2026/04/22 07:31:22 before step
2026/04/22 07:31:22 after step
2026/04/22 07:31:22 before step
2026/04/22 07:31:22 after step
2026/04/22 07:31:22 before step
2026/04/22 07:31:22 after step
2026/04/22 07:31:22 before step
2026/04/22 07:31:22 after step
2026/04/22 07:31:22 before step
2026/04/22 07:31:23 kill set to 1
2026/04/22 07:33:47 after step
2026/04/22 07:33:47 checkSingle ended
2026/04/22 07:33:47 checkParallel ended
    /Users/henrybear327/go/src/etcd/tests/robustness/validate/operations_test.go:360: validateLinearizableOperationsAndVisualize(...) does not respect timeout: 2m24.622795042s, timeout was 1s
--- FAIL: TestValidateLinearizableOperationsTimeoutIsRespected (144.63s)
FAIL
FAIL    go.etcd.io/etcd/tests/v3/robustness/validate    145.072s
FAIL
➜  etcd git:(robustness/porcupine_timeout_demo) ✗
```

Signed-off-by: Chun-Hung Tseng <henrytseng@google.com>
@k8s-ci-robot
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: henrybear327
Once this PR has been reviewed and has the lgtm label, please assign siyuanfoundation for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

…tness test

Ref:
anishathalye/porcupine#44

Signed-off-by: Chun-Hung Tseng <henrytseng@google.com>
@henrybear327 henrybear327 force-pushed the fix/checkSingle_timeout branch from 4460005 to 4fbc45d Compare May 6, 2026 08:56
@codecov
Copy link
Copy Markdown

codecov Bot commented May 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 70.26%. Comparing base (c867abd) to head (4fbc45d).
⚠️ Report is 14 commits behind head on main.

Additional details and impacted files

see 25 files with indirect coverage changes

@@            Coverage Diff             @@
##             main   #21715      +/-   ##
==========================================
- Coverage   70.30%   70.26%   -0.04%     
==========================================
  Files         425      425              
  Lines       35145    35145              
==========================================
- Hits        24708    24696      -12     
- Misses       9044     9051       +7     
- Partials     1393     1398       +5     

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c867abd...4fbc45d. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@k8s-ci-robot
Copy link
Copy Markdown

@henrybear327: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-etcd-coverage-report 4fbc45d link true /test pull-etcd-coverage-report

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@henrybear327
Copy link
Copy Markdown
Contributor Author

As discussed offline with Marek, this is a proper fix to the issue, but we need a solution in the meantime to fix the problem first.

I will open up another PR to quickly address our current problem, while we wait for the discussion and release from the upstream :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

2 participants