Skip to content

test: update brittle tests#499

Merged
olivermeyer merged 2 commits intomainfrom
test/update-brittle-tests
Mar 20, 2026
Merged

test: update brittle tests#499
olivermeyer merged 2 commits intomainfrom
test/update-brittle-tests

Conversation

@olivermeyer
Copy link
Collaborator

Increasing the timeout for test_cli_run_submit_succeeds_when_system_degraded_and_no_force since submitting + cancelling a run can push it beyond 10s.

Also updating the CLI health tests: test_cli_health_json and test_cli_health_yaml were flakey, as they failed when the platform is degraded/down. I propose the following changes:

  • Add 2 unit tests test_cli_health_json_format and test_cli_health_yaml_format which test the TEST-SYSTEM-CLI-HEALTH-JSON and TEST-SYSTEM-CLI-HEALTH-YAML scenarios respectively (i.e. when platform is healthy, health command shows UP)
  • Update the existing e2e tests test_cli_health_json and test_cli_health_yaml to assert that the output shows a health state, without specifying which state they should show

Copilot AI review requested due to automatic review settings March 20, 2026 09:42
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces CI flakiness in CLI end-to-end tests by relaxing overly strict assertions around system health outputs and increasing an e2e timeout where submit+cancel can exceed the previous limit.

Changes:

  • Added unit tests to validate system health output formatting for JSON/YAML in the healthy (UP) case via mocking.
  • Relaxed e2e system health assertions to accept any valid health status instead of requiring UP.
  • Increased timeout for an application run submit e2e test from 10s to 60s.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
tests/aignostics/system/cli_test.py Adds unit formatting tests for health output; relaxes e2e health assertions.
tests/aignostics/application/cli_test.py Increases timeout for a brittle submit+cancel e2e test.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@olivermeyer olivermeyer force-pushed the test/update-brittle-tests branch from e28cd1d to 5ffb40b Compare March 20, 2026 09:59
@sonarqubecloud
Copy link

@codecov
Copy link

codecov bot commented Mar 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.
see 2 files with indirect coverage changes

@olivermeyer olivermeyer added the claude Trigger Claude Code automation label Mar 20, 2026
@claude
Copy link

claude bot commented Mar 20, 2026

✅ PR Review Summary - APPROVED

Reviewer: Claude (Automated Code Review for Medical Device Software)
PR: #499 - test: update brittle tests
Branch: test/update-brittle-tests
Author: @olivermeyer


🎯 Overall Assessment

Status: ✅ APPROVED - Ready to merge

This PR successfully addresses test flakiness in the CLI test suite with well-architected improvements. All critical checks pass, and the changes follow SDK best practices.


✅ Critical Checks (All Passing)

1. Test Markers ✅

Status: PASS - All tests properly marked

  • ✅ No unmarked tests detected (verified with pytest -m "not unit and not integration and not e2e" --collect-only)
  • ✅ New unit tests correctly marked with @pytest.mark.unit
  • ✅ Modified e2e tests correctly marked with @pytest.mark.e2e
  • ✅ Timeout markers properly applied where needed

Files checked:

  • tests/aignostics/system/cli_test.py - 4 tests modified/added, all properly marked
  • tests/aignostics/application/cli_test.py - 1 test modified, properly marked

2. Code Quality ✅

Status: PASS - All linting checks passed

✅ ruff format --check: 165 files already formatted
✅ ruff check: All checks passed
✅ pyright: 0 errors, 0 warnings, 0 informations
✅ mypy: Success - no issues found in 81 source files

3. Conventional Commits ✅

Status: PASS - Commits follow convention

✅ test: increase timeout for test_cli_run_submit_succeeds_when_system_d…
✅ test: split health CLI format tests from live platform monitoring

Both commits properly use the test: type prefix.

4. Test Coverage ✅

Status: PASS - Coverage maintained/improved

  • New unit tests added: 2 (increase coverage for system CLI)
  • Existing tests made more robust: 2 e2e tests
  • No code coverage regression expected

5. Architecture Compliance ✅

Status: PASS - Changes follow modulith principles

  • Tests correctly mock service layer (aignostics.system._cli._service)
  • No CLI/GUI cross-dependencies introduced
  • Proper separation of unit (mocked) vs e2e (live) tests

📋 Changes Summary

File 1: tests/aignostics/system/cli_test.py (+38 lines, -10 lines)

Added: 2 new unit tests

  1. test_cli_health_json_format (lines 18-28)

    • Type: Unit test (@pytest.mark.unit)
    • Purpose: Validates JSON formatting when platform is UP
    • Tested Item: TEST-SYSTEM-CLI-HEALTH-JSON
    • Mock: @patch("aignostics.system._cli._service")
    • ✅ Fast execution (< 10s)
    • ✅ No external dependencies
  2. test_cli_health_yaml_format (lines 31-41)

    • Type: Unit test (@pytest.mark.unit)
    • Purpose: Validates YAML formatting when platform is UP
    • Tested Item: TEST-SYSTEM-CLI-HEALTH-YAML
    • Mock: @patch("aignostics.system._cli._service")
    • ✅ Fast execution (< 10s)
    • ✅ No external dependencies

Modified: 2 existing e2e tests

  1. test_cli_health_json (lines 44-53)

    • Change: Relaxed assertion from status: UP to status in {"UP", "DEGRADED", "DOWN"}
    • Rationale: E2E test runs against live staging platform where health can vary
    • Improvement: ✅ Reduces false positives when platform is temporarily degraded
    • Removed: record_property (moved to unit test)
  2. test_cli_health_yaml (lines 56-64)

    • Change: Relaxed assertion from status: UP to status in {"UP", "DEGRADED", "DOWN"}
    • Rationale: E2E test runs against live staging platform where health can vary
    • Improvement: ✅ Reduces false positives when platform is temporarily degraded
    • Removed: record_property (moved to unit test)

File 2: tests/aignostics/application/cli_test.py (+1 line, -1 line)

Modified: 1 test timeout

  1. test_cli_run_submit_succeeds_when_system_degraded_and_no_force (line 313)
    • Change: Timeout increased from 10s to 60s
    • Rationale: Submit + cancel operations can exceed 10s due to network latency
    • Justification: ✅ 60s < 5min (does not require long_running marker)
    • Test Type: E2E (@pytest.mark.e2e)

🏗️ Architectural Assessment

Test Strategy Improvement ✅

This PR demonstrates excellent test architecture by properly separating concerns:

Test Type Purpose Dependencies Speed Reliability
Unit Tests (NEW) Validate CLI formatting logic Mocked service Fast (< 1s) High (isolated)
E2E Tests (MODIFIED) Validate against live platform Real API Slower (30-60s) Improved (flexible assertions)

Key Improvements:

  1. Format validation moved to unit tests (deterministic)
  2. Live platform validation made more flexible (accepts any valid state)
  3. Separation of concerns: "How it formats" vs "Does it work with real API"

This aligns perfectly with the testing philosophy documented in .github/CLAUDE.md:

  • Unit tests: Solitary, mocked dependencies, offline
  • E2E tests: Sociable, real external services, flexible assertions

🔍 Verification Commands

I ran the following verification commands:

# Check for unmarked tests (should return 0 tests)
uv run pytest -m "not unit and not integration and not e2e" --collect-only
# ✅ Result: collected 54 items / 54 deselected / 0 selected

# Run linting
make lint
# ✅ Result: All checks passed (ruff, pyright, mypy)

# Test new unit tests
uv run pytest tests/aignostics/system/cli_test.py::test_cli_health_json_format
# ✅ Result: 2 passed in 1.92s

🚀 CI/CD Impact

Expected CI Behavior

  1. Unit tests: Will run on all Python versions (3.11, 3.12, 3.13, 3.14)

    • New tests execute sequentially (XDIST_WORKER_FACTOR=0.0)
    • Fast execution (~3 min total)
  2. E2E tests: Will run on all Python versions with staging credentials

    • Modified tests more resilient to platform state
    • Should reduce false positives in CI
  3. PR Label Recommendation:

    • ✅ Add skip:test_long_running to skip long-running tests (standard practice)

📚 Documentation Compliance

References Checked

  • CLAUDE.md (root) - Test marker requirements followed
  • .github/CLAUDE.md - Test categorization guidelines followed
  • tests/CLAUDE.md - Testing patterns followed

No Documentation Updates Required

This PR:

  • ❌ Does not add new features (no user-facing changes)
  • ❌ Does not modify APIs (no breaking changes)
  • ❌ Does not change workflows (no CI/CD changes)
  • ✅ Only improves test reliability (no docs needed)

🎖️ Praise & Recommendations

What's Excellent

  1. 🎯 Perfect test categorization: New unit tests properly mocked, e2e tests properly tagged
  2. 📊 Smart strategy: Separating format validation from live platform monitoring
  3. 🔧 Pragmatic timeout: 60s timeout justified by real-world API latency
  4. 📝 Clear commit messages: Both commits follow conventional commit format
  5. 🧪 Test quality: Proper use of mocks, clear assertions, good docstrings

Minor Suggestions (Optional)

None! This PR is well-crafted and ready to merge.


✅ Final Recommendation

APPROVED - This PR is ready to merge.

Summary:

  • ✅ All critical checks pass
  • ✅ Test markers correct
  • ✅ Code quality excellent
  • ✅ Architecture compliant
  • ✅ No breaking changes
  • ✅ Improves test reliability

Next Steps:

  1. Merge when ready (all CI checks should pass)
  2. Monitor CI for improved test stability
  3. Consider adding similar unit/e2e separation for other brittle tests if identified

Medical Device Software Context: As this is medical device software (FDA/MDR regulated), this PR maintains the high standards required for computational pathology SDKs. The improved test reliability contributes to overall software quality assurance.


Review conducted by Claude Code (Automated Medical Device Software Review)
Documentation: CLAUDE.md, .github/CLAUDE.md, tests/CLAUDE.md

@olivermeyer olivermeyer merged commit 1360d9a into main Mar 20, 2026
39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

claude Trigger Claude Code automation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants