[#18] xbrain diff between snapshots by VGonPa · Pull Request #28 · VGonPa/xbrain

VGonPa · 2026-05-22T16:49:03Z

Summary

Adds xbrain diff <snapshot-a> [snapshot-b] — the missing tool to answer
"what changed between two states of data/?" after a re-enrichment or
vocab rebuild. Builds on the snapshot system from #17.

Default snapshot-b = the live data/ so the common case is one short
command: xbrain diff <pre-snapshot> after a destructive run.
Reports four sections:
- Items — reassignment count, % of enriched-on-both-sides, top N transitions.
- Topics — per-topic membership added/removed/unchanged with a 10% growth flag (5-member floor so tiny topics don't fire false alarms).
- Overviews — TF cosine similarity, bucketed into identical / similar / different / not_comparable.
- Vocab — slugs added, removed, unchanged.
--format json emits the pydantic model as JSON; consumers can jq or build CI thresholds.

What ships

New module src/xbrain/diff.py — pure I/O, no CLI side-effects. Pydantic models (DiffReport, ItemsDiff, TopicChange, etc.), pure-Python TF cosine helper, text + JSON renderers.
New CLI verb diff in src/xbrain/cli.py.
tests/test_diff.py — 27 tests (unit module + CLI integration via CliRunner).
README — Commands table row and a Snapshots & safety paragraph + example invocations.
No new dependencies.

Spec deviations

None of substance. Implementation matches the PRD on every acceptance criterion.

Overview drift: TF cosine in pure Python (no scikit-learn, no embeddings, no API call). Decision documented in PRD §6. Embeddings / LLM-judged similarity remain follow-ups gated on WS3 (WS3 — enrichment evaluation harness #8).
Rename detection: out for v1 per PRD §3.
--judge flag: intentionally not stubbed — adds dead code that would have to be redesigned once WS3 — enrichment evaluation harness #8 lands.

Test plan

tests/test_diff.py — 27 new tests, all green.
Full suite: 326/326 (305 baseline + 21 from rebased develop's fetch: retry transiently-failed items without --force #19 + 27 new).
uv run poe check — all-green:
- ruff (lint + format), mypy, bandit, vulture, interrogate, deptry, detect-secrets — PASS
- radon — WARN only on pre-existing C-grade functions (validate_judgment, _render_index, _archive_tweet_to_item); every new function in diff.py is A/B
- tests — 326 PASS
- coverage — 89% (diff.py at 95%)
Manual: uv run xbrain diff --help renders correctly. Manual round-trip with two snapshots produces the expected report.

Links

PRD: zz-support-files/docs/prds/2026-05-22-xbrain-18-diff-snapshots.md
Plan: zz-support-files/docs/implementation-plans/2026-05-22-xbrain-18-diff-snapshots.md
Issue: xbrain diff — compare two snapshots, surface unexpected drift #18
Depends on: Auto-snapshot data/ before destructive operations #17 (auto-snapshot) — already merged into develop

Adds `xbrain diff <snapshot-a> [snapshot-b]` (B defaults to live `data/`). Reports primary_topic reassignments with top transitions, per-topic membership shifts with growth flags, topic-overview drift via pure-Python TF cosine similarity, and vocab slug changes. `--format text|json`. The diff module is pure I/O: `diff_snapshots(a_dir, b_dir) -> DiffReport`. Snapshot resolution stays in the CLI; the module diffs two data-shaped directories, so the live `data/` is a first-class B side without a special case. No new dependencies — TF cosine fits in ~40 lines over collections.Counter. Embeddings / LLM-judged similarity are explicit follow-ups gated on WS3 (#8). Rename detection is out of scope for v1. 27 new tests (unit + CLI integration); full suite 326/326. Quality gate all-green: ruff, ruff format, mypy, bandit, vulture, interrogate, deptry, detect-secrets, pytest, 89% coverage. Refs: #18

Addresses every HIGH/MEDIUM finding from the 6-reviewer panel on PR #28 (code-reviewer + python-code-reviewer + spec-compliance + test-analyzer APPROVED; silent-failure-hunter and simplifier flagged actionable items): silent-failure-hunter MEDIUM #1: silent empty-diff on missing dirs. - `diff_snapshots` now validates both directories exist on disk; raises FileNotFoundError with the missing path if not. - Validates that at least one artifact exists on either side; if both are fully empty, raises FileNotFoundError naming both dirs (guards against the "data/ deleted out-of-band" scenario where diff would otherwise silently report 'everything was removed'). silent-failure-hunter MEDIUM #2: corrupt-file errors lacked context. - Each loader call inside `diff_snapshots` is wrapped to add the path to the ValueError message ("failed to load <path>: <orig msg>"), so a malformed items.json / vocab.yaml / topics.json surfaces with the specific file rather than a bare pydantic / json traceback. simplifier #1: `_tfidf_cosine` renamed to `_tf_cosine`. - The function uses plain TF cosine, not TF-IDF (with only 2 documents, IDF degenerates). Docstring already explained this; renaming the symbol stops the name from lying. Module docstring + every call site + import updated. simplifier #2: `VocabDiff.unchanged: list[str]` was only ever consumed as `len(...)`. Replaced with `unchanged_count: int`. JSON output is slightly smaller on large vocabs and the data shape stops promising information the consumers don't read. spec-compliance follow-up: `diff.py` added to the "Where things live" tree in ARCHITECTURE.md. Tests: - Three new tests for the validation: missing dir → FileNotFoundError, both-empty → FileNotFoundError, corrupt items.json → ValueError with path in message. - Existing tests updated for the `unchanged_count` rename. Skipped (out of scope for this round, documented in task #88): - test-analyzer polish (French tokenizer test, JSON schema-stability deeper assertions, secondary-topic-no-reassign test) — improvements not blockers. - simplifier #3, #4 (drop TopicChange.unchanged, drop DiffSummary) — borderline derivability vs. JSON consumer ergonomics. - simplifier #5, #6 (Literal["text","json"] dispatch, drop diff_snapshots threshold kwargs) — internal-only style. - code-reviewer/python-code-reviewer naming nit (`reassigned_pct` carrying a fraction) — purely cosmetic, internal consistency intact. Total: 329 tests (up from 326), coverage 89%, `uv run poe check` all-green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

VGonPa and others added 2 commits May 22, 2026 18:48

VGonPa merged commit c9df2c5 into develop May 22, 2026
1 check passed

VGonPa deleted the ws-18-diff branch May 22, 2026 16:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[#18] xbrain diff between snapshots#28

[#18] xbrain diff between snapshots#28
VGonPa merged 2 commits into
developfrom
ws-18-diff

VGonPa commented May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

VGonPa commented May 22, 2026

Summary

What ships

Spec deviations

Test plan

Links

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant