Skip to content

test(e2e): replace flaky Python live policy update tests with Rust#742

Merged
johntmyers merged 1 commit intomainfrom
fix-flaky-policy-e2e/jm
Apr 2, 2026
Merged

test(e2e): replace flaky Python live policy update tests with Rust#742
johntmyers merged 1 commit intomainfrom
fix-flaky-policy-e2e/jm

Conversation

@johntmyers
Copy link
Copy Markdown
Collaborator

Summary

Replace two flaky Python e2e tests for live policy updates with Rust e2e tests that use the CLI's built-in --wait flag for reliable synchronization instead of manual 90s poll loops.

Related Issue

Fixes flaky E2E (python) job failure: https://github.com/NVIDIA/OpenShell/actions/runs/23920278132/job/69765280000?pr=740

Changes

  • Removed from e2e/python/test_sandbox_policy.py:
    • test_live_policy_update_and_logs — flaked with "Policy v2 was not loaded within 90s" due to manual time.sleep(2) poll loop with hard 90s deadline
    • test_live_policy_update_from_empty_network_policies — same poll pattern, same flake risk
  • Added e2e/rust/tests/live_policy_update.rs with two tests:
    • live_policy_update_round_trip — set policy A, verify version, re-push A (idempotent), push B (version bump via --wait), re-push B (idempotent), verify policy list history
    • live_policy_update_from_empty_network_policies — set empty network policy, push policy with rules, verify version bumps

Why Rust tests are more reliable

The Python tests polled GetSandboxPolicyStatus RPC every 2s with a 90s hard deadline. The Rust tests use openshell policy set --wait --timeout 120, which delegates synchronization to the CLI's own wait logic — eliminating the timing sensitivity.

Coverage notes

  • Policy lifecycle (version, hash, idempotency, history): fully covered
  • Proxy enforcement after update: covered by existing L4/L7/SSRF Python e2e tests
  • GetSandboxLogs RPC: was tested in the old test but not replaced — known gap for follow-up

Testing

  • cargo clippy passes on new test (no warnings)
  • cargo check --features e2e passes for all Rust e2e tests
  • mise run pre-commit (CI will verify)

Checklist

  • Follows Conventional Commits
  • Code follows the project's coding standards
  • No secrets or credentials in the diff
  • Changes scoped to the issue at hand

@johntmyers johntmyers self-assigned this Apr 2, 2026
@johntmyers johntmyers requested a review from a team as a code owner April 2, 2026 21:09
@johntmyers johntmyers added the test:e2e Requires end-to-end coverage label Apr 2, 2026
drew
drew previously approved these changes Apr 2, 2026
Remove test_live_policy_update_and_logs and
test_live_policy_update_from_empty_network_policies from the Python e2e
suite. Both used a manual 90s poll loop against GetSandboxPolicyStatus
that flaked in CI with 'Policy v2 was not loaded within 90s'.

Add e2e/rust/tests/live_policy_update.rs with two replacement tests
that exercise the same policy lifecycle (version bumping, hash
idempotency, policy list history) through the CLI using the built-in
--wait flag for reliable synchronization.
@johntmyers johntmyers force-pushed the fix-flaky-policy-e2e/jm branch from 64a46ec to 4501568 Compare April 2, 2026 21:32
@johntmyers johntmyers merged commit 77e55ea into main Apr 2, 2026
10 of 11 checks passed
@johntmyers johntmyers deleted the fix-flaky-policy-e2e/jm branch April 2, 2026 22:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test:e2e Requires end-to-end coverage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants