Conversation
…egraded_and_no_force
There was a problem hiding this comment.
Pull request overview
This PR reduces CI flakiness in CLI end-to-end tests by relaxing overly strict assertions around system health outputs and increasing an e2e timeout where submit+cancel can exceed the previous limit.
Changes:
- Added unit tests to validate
system healthoutput formatting for JSON/YAML in the healthy (UP) case via mocking. - Relaxed e2e
system healthassertions to accept any valid health status instead of requiringUP. - Increased timeout for an application run submit e2e test from 10s to 60s.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| tests/aignostics/system/cli_test.py | Adds unit formatting tests for health output; relaxes e2e health assertions. |
| tests/aignostics/application/cli_test.py | Increases timeout for a brittle submit+cancel e2e test. |
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
e28cd1d to
5ffb40b
Compare
|
Codecov Report✅ All modified and coverable lines are covered by tests. |
✅ PR Review Summary - APPROVEDReviewer: Claude (Automated Code Review for Medical Device Software) 🎯 Overall AssessmentStatus: ✅ APPROVED - Ready to merge This PR successfully addresses test flakiness in the CLI test suite with well-architected improvements. All critical checks pass, and the changes follow SDK best practices. ✅ Critical Checks (All Passing)1. Test Markers ✅Status: PASS - All tests properly marked
Files checked:
2. Code Quality ✅Status: PASS - All linting checks passed ✅ ruff format --check: 165 files already formatted
✅ ruff check: All checks passed
✅ pyright: 0 errors, 0 warnings, 0 informations
✅ mypy: Success - no issues found in 81 source files3. Conventional Commits ✅Status: PASS - Commits follow convention Both commits properly use the 4. Test Coverage ✅Status: PASS - Coverage maintained/improved
5. Architecture Compliance ✅Status: PASS - Changes follow modulith principles
📋 Changes SummaryFile 1:
|
| Test Type | Purpose | Dependencies | Speed | Reliability |
|---|---|---|---|---|
| Unit Tests (NEW) | Validate CLI formatting logic | Mocked service | Fast (< 1s) | High (isolated) |
| E2E Tests (MODIFIED) | Validate against live platform | Real API | Slower (30-60s) | Improved (flexible assertions) |
Key Improvements:
- ✅ Format validation moved to unit tests (deterministic)
- ✅ Live platform validation made more flexible (accepts any valid state)
- ✅ Separation of concerns: "How it formats" vs "Does it work with real API"
This aligns perfectly with the testing philosophy documented in .github/CLAUDE.md:
- Unit tests: Solitary, mocked dependencies, offline
- E2E tests: Sociable, real external services, flexible assertions
🔍 Verification Commands
I ran the following verification commands:
# Check for unmarked tests (should return 0 tests)
uv run pytest -m "not unit and not integration and not e2e" --collect-only
# ✅ Result: collected 54 items / 54 deselected / 0 selected
# Run linting
make lint
# ✅ Result: All checks passed (ruff, pyright, mypy)
# Test new unit tests
uv run pytest tests/aignostics/system/cli_test.py::test_cli_health_json_format
# ✅ Result: 2 passed in 1.92s🚀 CI/CD Impact
Expected CI Behavior
-
Unit tests: Will run on all Python versions (3.11, 3.12, 3.13, 3.14)
- New tests execute sequentially (XDIST_WORKER_FACTOR=0.0)
- Fast execution (~3 min total)
-
E2E tests: Will run on all Python versions with staging credentials
- Modified tests more resilient to platform state
- Should reduce false positives in CI
-
PR Label Recommendation:
- ✅ Add
skip:test_long_runningto skip long-running tests (standard practice)
- ✅ Add
📚 Documentation Compliance
References Checked
- ✅ CLAUDE.md (root) - Test marker requirements followed
- ✅ .github/CLAUDE.md - Test categorization guidelines followed
- ✅ tests/CLAUDE.md - Testing patterns followed
No Documentation Updates Required
This PR:
- ❌ Does not add new features (no user-facing changes)
- ❌ Does not modify APIs (no breaking changes)
- ❌ Does not change workflows (no CI/CD changes)
- ✅ Only improves test reliability (no docs needed)
🎖️ Praise & Recommendations
What's Excellent
- 🎯 Perfect test categorization: New unit tests properly mocked, e2e tests properly tagged
- 📊 Smart strategy: Separating format validation from live platform monitoring
- 🔧 Pragmatic timeout: 60s timeout justified by real-world API latency
- 📝 Clear commit messages: Both commits follow conventional commit format
- 🧪 Test quality: Proper use of mocks, clear assertions, good docstrings
Minor Suggestions (Optional)
None! This PR is well-crafted and ready to merge.
✅ Final Recommendation
APPROVED - This PR is ready to merge.
Summary:
- ✅ All critical checks pass
- ✅ Test markers correct
- ✅ Code quality excellent
- ✅ Architecture compliant
- ✅ No breaking changes
- ✅ Improves test reliability
Next Steps:
- Merge when ready (all CI checks should pass)
- Monitor CI for improved test stability
- Consider adding similar unit/e2e separation for other brittle tests if identified
Medical Device Software Context: As this is medical device software (FDA/MDR regulated), this PR maintains the high standards required for computational pathology SDKs. The improved test reliability contributes to overall software quality assurance.
Review conducted by Claude Code (Automated Medical Device Software Review)
Documentation: CLAUDE.md, .github/CLAUDE.md, tests/CLAUDE.md



Increasing the timeout for
test_cli_run_submit_succeeds_when_system_degraded_and_no_forcesince submitting + cancelling a run can push it beyond 10s.Also updating the CLI health tests:
test_cli_health_jsonandtest_cli_health_yamlwere flakey, as they failed when the platform is degraded/down. I propose the following changes:test_cli_health_json_formatandtest_cli_health_yaml_formatwhich test theTEST-SYSTEM-CLI-HEALTH-JSONandTEST-SYSTEM-CLI-HEALTH-YAMLscenarios respectively (i.e. when platform is healthy, health command shows UP)test_cli_health_jsonandtest_cli_health_yamlto assert that the output shows a health state, without specifying which state they should show