test: steel-man test suites to prevent Goodharting#18
Merged
Conversation
MOTIVATION: - Several tests were mocking the very functions they were supposed to test - Tests only verified partial fields, missing serialization bugs - Registry tests lacked error handling coverage for real-world failures - ComparisonService tests didn't cover malformed LLM responses APPROACH: - Converted mock-based tests to true integration tests using fake-indexeddb - Added full-field assertions for round-trip fidelity (all DiffResult fields) - Added error handling tests (HTTP 404/500, timeouts, malformed JSON) - Added resilience tests for incomplete/null data in responses CHANGES: - tests/current-system/export-import.test.ts: - P0: Replaced mocked import test with true integration test - P0: Added full-field assertions for diffResults round-trip - P1: Converted diffResults export test to integration - P1: Converted amendmentLogs tests to integration with round-trip - tests/integration/registry.test.ts: - P1: Added HTTP error response tests (404, 500) - P1: Added network timeout test - P1: Added malformed JSON response test - P2: Added incomplete data handling tests (missing fields, null values) - tests/services/comparisonService.test.ts: - P2: Added malformed response tests (empty choices, null content, etc.) - P2: Improved provider config verification (full client config) IMPACT: - Tests now catch real serialization bugs, not just "mock was called" - Coverage for edge cases and error paths - Higher confidence in export/import fidelity TESTING: - All 552 tests pass - Verified each test file independently before full suite run 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Changes by Priority
P0 - Critical (tests were testing mocks, not code)
export-import.test.ts: Replaced mocked import test with true integration testexport-import.test.ts: Added full-field assertions for diffResults round-trip (all fields including hashes, markers, timestamps)P1 - High (incomplete coverage)
export-import.test.ts: Converted diffResults/amendmentLogs export tests to real integration testsregistry.test.ts: Added HTTP error response tests (404, 500), timeout test, malformed JSON testP2 - Medium (edge cases)
registry.test.ts: Added incomplete data handling tests (missing fields, null values, empty URLs)comparisonService.test.ts: Added malformed response tests (empty choices, null content, truncated JSON)comparisonService.test.ts: Improved provider config verification (full client config instead of just URL)Test Metrics
Test plan
Why this matters
These tests were "Goodharting" - optimizing for passing tests rather than validating actual behavior:
ImportOps.importFullSessionDatathen checking the mock was called proves nothing about import logic🤖 Generated with Claude Code