Skip to content

[#18] xbrain diff between snapshots#28

Merged
VGonPa merged 2 commits into
developfrom
ws-18-diff
May 22, 2026
Merged

[#18] xbrain diff between snapshots#28
VGonPa merged 2 commits into
developfrom
ws-18-diff

Conversation

@VGonPa
Copy link
Copy Markdown
Owner

@VGonPa VGonPa commented May 22, 2026

Summary

Adds xbrain diff <snapshot-a> [snapshot-b] — the missing tool to answer
"what changed between two states of data/?" after a re-enrichment or
vocab rebuild. Builds on the snapshot system from #17.

  • Default snapshot-b = the live data/ so the common case is one short
    command: xbrain diff <pre-snapshot> after a destructive run.
  • Reports four sections:
    • Items — reassignment count, % of enriched-on-both-sides, top N transitions.
    • Topics — per-topic membership added/removed/unchanged with a 10% growth flag (5-member floor so tiny topics don't fire false alarms).
    • Overviews — TF cosine similarity, bucketed into identical / similar / different / not_comparable.
    • Vocab — slugs added, removed, unchanged.
  • --format json emits the pydantic model as JSON; consumers can jq or build CI thresholds.

What ships

  • New module src/xbrain/diff.py — pure I/O, no CLI side-effects. Pydantic models (DiffReport, ItemsDiff, TopicChange, etc.), pure-Python TF cosine helper, text + JSON renderers.
  • New CLI verb diff in src/xbrain/cli.py.
  • tests/test_diff.py — 27 tests (unit module + CLI integration via CliRunner).
  • README — Commands table row and a Snapshots & safety paragraph + example invocations.
  • No new dependencies.

Spec deviations

None of substance. Implementation matches the PRD on every acceptance criterion.

  • Overview drift: TF cosine in pure Python (no scikit-learn, no embeddings, no API call). Decision documented in PRD §6. Embeddings / LLM-judged similarity remain follow-ups gated on WS3 (WS3 — enrichment evaluation harness #8).
  • Rename detection: out for v1 per PRD §3.
  • --judge flag: intentionally not stubbed — adds dead code that would have to be redesigned once WS3 — enrichment evaluation harness #8 lands.

Test plan

  • tests/test_diff.py — 27 new tests, all green.
  • Full suite: 326/326 (305 baseline + 21 from rebased develop's fetch: retry transiently-failed items without --force #19 + 27 new).
  • uv run poe check — all-green:
    • ruff (lint + format), mypy, bandit, vulture, interrogate, deptry, detect-secrets — PASS
    • radon — WARN only on pre-existing C-grade functions (validate_judgment, _render_index, _archive_tweet_to_item); every new function in diff.py is A/B
    • tests — 326 PASS
    • coverage — 89% (diff.py at 95%)
  • Manual: uv run xbrain diff --help renders correctly. Manual round-trip with two snapshots produces the expected report.

Links

VGonPa and others added 2 commits May 22, 2026 18:48
Adds `xbrain diff <snapshot-a> [snapshot-b]` (B defaults to live `data/`).
Reports primary_topic reassignments with top transitions, per-topic
membership shifts with growth flags, topic-overview drift via pure-Python
TF cosine similarity, and vocab slug changes. `--format text|json`.

The diff module is pure I/O: `diff_snapshots(a_dir, b_dir) -> DiffReport`.
Snapshot resolution stays in the CLI; the module diffs two data-shaped
directories, so the live `data/` is a first-class B side without a special
case.

No new dependencies — TF cosine fits in ~40 lines over collections.Counter.
Embeddings / LLM-judged similarity are explicit follow-ups gated on WS3 (#8).
Rename detection is out of scope for v1.

27 new tests (unit + CLI integration); full suite 326/326. Quality gate
all-green: ruff, ruff format, mypy, bandit, vulture, interrogate, deptry,
detect-secrets, pytest, 89% coverage.

Refs: #18
Addresses every HIGH/MEDIUM finding from the 6-reviewer panel on PR #28
(code-reviewer + python-code-reviewer + spec-compliance + test-analyzer
APPROVED; silent-failure-hunter and simplifier flagged actionable items):

silent-failure-hunter MEDIUM #1: silent empty-diff on missing dirs.
- `diff_snapshots` now validates both directories exist on disk; raises
  FileNotFoundError with the missing path if not.
- Validates that at least one artifact exists on either side; if both are
  fully empty, raises FileNotFoundError naming both dirs (guards against
  the "data/ deleted out-of-band" scenario where diff would otherwise
  silently report 'everything was removed').

silent-failure-hunter MEDIUM #2: corrupt-file errors lacked context.
- Each loader call inside `diff_snapshots` is wrapped to add the path to
  the ValueError message ("failed to load <path>: <orig msg>"), so a
  malformed items.json / vocab.yaml / topics.json surfaces with the
  specific file rather than a bare pydantic / json traceback.

simplifier #1: `_tfidf_cosine` renamed to `_tf_cosine`.
- The function uses plain TF cosine, not TF-IDF (with only 2 documents,
  IDF degenerates). Docstring already explained this; renaming the
  symbol stops the name from lying. Module docstring + every call site
  + import updated.

simplifier #2: `VocabDiff.unchanged: list[str]` was only ever consumed
as `len(...)`. Replaced with `unchanged_count: int`. JSON output is
slightly smaller on large vocabs and the data shape stops promising
information the consumers don't read.

spec-compliance follow-up: `diff.py` added to the "Where things live"
tree in ARCHITECTURE.md.

Tests:
- Three new tests for the validation: missing dir → FileNotFoundError,
  both-empty → FileNotFoundError, corrupt items.json → ValueError with
  path in message.
- Existing tests updated for the `unchanged_count` rename.

Skipped (out of scope for this round, documented in task #88):
- test-analyzer polish (French tokenizer test, JSON schema-stability
  deeper assertions, secondary-topic-no-reassign test) — improvements
  not blockers.
- simplifier #3, #4 (drop TopicChange.unchanged, drop DiffSummary) —
  borderline derivability vs. JSON consumer ergonomics.
- simplifier #5, #6 (Literal["text","json"] dispatch, drop
  diff_snapshots threshold kwargs) — internal-only style.
- code-reviewer/python-code-reviewer naming nit (`reassigned_pct`
  carrying a fraction) — purely cosmetic, internal consistency intact.

Total: 329 tests (up from 326), coverage 89%, `uv run poe check`
all-green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant