[#34] Phase B: describe images with vision LLM and feed into enrich#37
Conversation
Introduce a fifth media variant carrying the vision-LLM output: is_decorative + description + description_lang + description_version + described_at, on top of the downloaded-state fields. The discriminated union grows from four to five `kind`s; the path-traversal validator carries over verbatim, since the on-disk bytes are inherited from the prior downloaded state. Update every isinstance cascade that walked the union (media.py eligibility, generate.py rendering + mirror, diff.py counting) to handle the new variant explicitly so `assert_never` keeps the exhaustiveness check honest. The diff report grows a `described` column and a `delta_described` field; the count is exposed through both the text and JSON renderers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The vision-describe call ships a declarative system prompt — same
shape as the other rubrics. It enumerates the decorative-vs-content
classification, the per-image description contract (1-3 sentences,
plain prose, faithful), the refusal fallback (decorative + empty
description), and the exact JSON output shape (list of {index,
is_decorative, description}).
The `{language}` placeholder follows the existing rubric convention
so `output_language` propagates into vision calls the same way it
already propagates into summary/topic-page rubrics.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The orchestrator walks every eligible photo, batches the bytes into vision-API calls (default 5 images per call), parses the per-image JSON list judgments, and transitions matched entries to MediaPhotoDescribed. Decorative classifications are persisted with an empty description so downstream consumers (enrich-time prompt, topic-synth prompt) can filter them out at a single seam. The shape mirrors xbrain.executors.api and xbrain.topic_synth: narrow recoverable-errors tuple, per-batch failure isolation, logger.warning on every failure, RuntimeError on total failure, and a SUMMARY: described/failed/skipped stderr line. Programmer bugs (AttributeError) and KeyboardInterrupt propagate as their narrow catch is on Exception subclasses only. Eligibility uses a description-version tag: bumping the configured version invalidates persisted descriptions automatically (no --force needed). The Ctrl-C-coherent invariant is delegated to an on_progress callback the CLI wires to a store write. 41 unit tests exercise idempotency, force/stale-version, batching, partial+total failure, refusal handling, language plumbing, programmer-bug propagation, KeyboardInterrupt, file-missing fallback, summary line, and clock injection — all via FakeAnthropic. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The CLI surface mirrors xbrain media: --force, --limit, --items, --model, --batch-size, --verbose. The output_language and the description_version both come from config (config.toml + new [describe] section); --model overrides config.describe_model per invocation. The pre-describe snapshot is auto-taken so a runaway prompt or wrong model can be rolled back with xbrain snapshot restore. Persistence is double-anchored: on_progress writes the store between batches (Ctrl-C-coherent invariant) and a finally clause catches any total-failure RuntimeError so the per-photo MediaPhotoDescribed records that landed before the raise are not discarded. config.toml.example documents the new [describe] section with defaults (Sonnet 4.6 + version v1). 4 CLI tests cover the happy path, no-op exit-0, --items mismatch warning, and total-failure exit-1 propagation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The enrich-time prompt in executors/api.py now carries an "Images in this post:" section listing every non-decorative MediaPhotoDescribed entry's prose, sitting between the post body and the links/article block so the LLM reads context in natural order. Decorative photos are filtered at the seam — they introduce no topic noise. TopicInput grows an image_descriptions field; build_topic_inputs flattens content-bearing descriptions across every post in the topic and feeds them to the topic-synth prompt below the per-post summaries. Same decorative-filter as enrich. The prompt shape stays back-compat: items / topics without described photos emit no images section at all. 8 new tests guard the integration: inclusion, decorative-filter, omission-when-empty, ordering vs links/article. Both _user_prompt helpers were split into per-section builders to keep complexity under grade C — no new C functions introduced. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The diff module already grew the described column + delta_described field in the model commit. This commit adds the two regression tests that prove the diff surfaces a downloaded → described transition between snapshots and that the text renderer emits the new "described:" row alongside the existing four. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
README grows a Commands row for describe, a Configuration row for [describe].model and [describe].version, and a Vision descriptions subsection in the Local media storage section that explains the $3-5 budget, the decorative-filter, and the description-version re-describe trigger. ARCHITECTURE adds a Per-stage describe section between fetch and vocab covering the state machine, batching, refusal handling, failure isolation, snapshot trigger, and the consumption seam in executors/api.py + topic_synth.py. The rubrics table grows a describe-image row; invariant #10 names the fifth Media variant and makes the linear-pipeline shape explicit (Pending → Downloaded → Described, with Failed as the off-ramp from Pending). Also bundles two formatter-only cleanups picked up by ruff format on the executors/api.py and topics.py changes from the prompt integration commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Drop `_print_partial_failure_summary` from `describe_all`: the CLI's
`describe_emit_summary_line` is the single source of truth (mirrors
`media`). Eliminates the double `SUMMARY:` emission on partial-failure
runs (python-code-reviewer BLOCKER).
- `MediaPhotoDescribed` now carries:
- `description_lang: SupportedLanguage` (typed alias over
`i18n.SUPPORTED_LANGUAGES`) — rejects unknown languages at the
type boundary (type-design-analyzer).
- Model validator enforcing `is_decorative => description == ""`
so a hand-constructed variant cannot violate the invariant
(type-design-analyzer).
- `described_at` UTC-aware enforcement via field_validator.
- Dedupe the `local_path` traversal guard via a shared free function
(`_reject_local_path_traversal`); same for the UTC-aware check
(`_require_utc_aware`). Closes the validator duplication called out
by python-code-reviewer + code-simplifier — done via free functions
rather than class inheritance because Liskov substitution would
silently re-match `MediaPhotoDescribed` against 25+
`isinstance(entry, MediaPhotoDownloaded)` call sites.
- Add the same UTC-aware contract to `MediaPhotoDownloaded.downloaded_at`
and `MediaPhotoFailed.last_attempt_at` (closes the pre-existing gap
for the variants this PR touches).
- Tests: orchestrator no longer emits SUMMARY on partial failure
(assert `err.count("SUMMARY:") == 0` — pinning the dedup). Test
helpers for decorative described-photo construction honour the new
empty-description invariant.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Convergent fixes from the round-1 reviewer pipeline.
Category 2.1 mixed-language drift (type-design-analyzer + silent-failure-hunter):
- `_is_stale` now also returns True when `description_lang` drifted vs
the configured output language. Switching `output_language` between
Spanish and English no longer leaves stale-language prose persisted —
the next run re-describes those entries.
- `output_language` parameter is typed `SupportedLanguage`, matching
the new `MediaPhotoDescribed.description_lang` field type. The
orchestrator now propagates a typed language end-to-end.
Category 3 over-decomposed orchestrator (code-simplifier):
- Collapsed the four-helper candidate chain (`_tally_skipped` +
`_iter_item_candidates` + `_take_with_limit` + outer
`_iter_candidates` + inner `_all_eligible` closure) into a single
`_iter_eligible_candidates` generator, mirroring `media._iter_eligible_attempts`.
A small `_tally_idempotency_skip` helper keeps the loop body
under radon grade C; `_filtered_items` keeps the outer scan flat.
- Removed `_take_with_limit`, `_chunked`, `_messages_create`, and
`_user_directive` as standalone helpers. `itertools.batched(...)`
replaces `_chunked`; the SDK call is inlined; the directive is a
small dict literal builder local to the block building.
Category 1.3 dead defence-in-depth (python-code-reviewer):
- Dropped the unreachable duplicate-judgment-index guard in
`_apply_batch_judgments`. The parser already rejects duplicate
indices, making the dict-construction guard dead code.
Category 4 silent-failure regressions (silent-failure-hunter):
- 4.1: SDK `stop_reason="refusal"` now transitions every photo in the
batch to `MediaPhotoDescribed(is_decorative=True, description="")`.
Satisfies the rubric contract at the SDK level so refused batches
make progress instead of churning.
- 4.2: `JSONDecodeError` now logs `response.stop_reason` so a
max-tokens truncation is diagnosable from the warning line alone.
- 4.3: Unknown extension in `_media_type` now emits
`logger.warning` before falling back to `image/jpeg`.
- 4.4: A batch where the model omitted some judgments now bumps
`batches_failed += 1` — the batch did not return complete data, so
the operator's "did this API call do its job" view is honest.
Category 5.3 type design (type-design-analyzer):
- Defined `MessagesClient` + `VisionClient` Protocols local to
`describe.py`. Dropped the `# type: ignore[attr-defined]` on
`client.messages.create(...)`.
Category 6 tests (pr-test-analyzer):
- 6.1: New full-payload kwarg shape assertion (model, max_tokens,
system, role, image+text block ordering, base64 source layout).
- 6.2: `test_describe_all_handles_refusal_as_decorative_empty` plus a
dedicated `_FakeRefusalResponse` fake.
- 6.3: `test_parse_batch_response_strips_fence_without_language_tag`.
- 6.4: `--verbose` CLI test exercising the per-failure listing.
- 6.5: New mixed-language eligibility test
(`test_eligible_described_stale_language_is_eligible`) and CLI
partial-failure SUMMARY count test pinning to exactly one.
- Plus refactored `_FakeMessagesList` so pre-built response objects
(refusal, truncation) bypass the JSON-list wrapper.
Category 7 comments + docs:
- 7.1: Fixed misleading `_MEDIA_TYPES` comment (was "three", actual
cardinality is 4 keys mapping to 3 distinct media types).
- 7.2: Rewrote `_run_describe` docstring — coherence is held by the
outer `try/finally`, not by `on_progress`.
- 7.3: Trimmed the `describe.py` module docstring to 2 paragraphs
matching the project voice (was 4 paragraphs, 22 lines).
- 7.4: Reworded ARCHITECTURE.md `### describe` ("a new variant on
the `MediaEntry` union"). Documented the language-staleness
contract in the state-machine doc.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Round-1 fix sweep — all 8-reviewer findings addressed2 commits pushed: Category 1 — 3+ reviewer convergence
Category 2 — 2 reviewer convergence
Category 3 — Over-decomposed orchestrator (code-simplifier)
Category 4 — Silent-failure regressions (silent-failure-hunter)
Category 5 — Type design (type-design-analyzer)
Category 6 — Tests (pr-test-analyzer)
Category 7 — Comments + docs
Defended decisionsWhy NOT class inheritance for
|
Single-caller helper merged into its only caller. Removes the last surviving gold-plating flagged by code-simplifier in round 1 of the Phase B review pipeline.
…ype warning Closes 3 flags from the verify round of the Phase B review pipeline: - test_models.py: 3 negative tests pin the model-level invariants on MediaPhotoDescribed (UTC-aware described_at, SUPPORTED_LANGUAGES membership for description_lang, decorative => empty description). - test_describe.py: 1 unit test pins _media_type's unknown-extension fallback to image/jpeg AND the operator-visible logger.warning, so a silent regression on either side is caught. Test count: 502 -> 506. Quality gate green.
|
Tiny fix round — 3 verify-round nits cleared. Commits:
Fixes:
Test count: 502 → 506 (+4). Quality gate: ruff check + ruff format + mypy + pytest + Ready for re-verify. |
Closes #34
Summary
MediaPhotoDescribed(extends downloaded withis_decorative+description+description_lang+description_version+described_at) so the discriminated union forms a linear pipeline:Pending → Downloaded → Described.xbrain describe: a Typer command that walks every downloaded photo, batches the bytes (default 5/call) into Claude Sonnet 4.6 vision calls, parses the per-image JSON list, and transitions matched entries. Idempotent — re-runs skip already-described photos unless[describe].versionis bumped inconfig.tomlor--forceis passed.rubrics/rubric-describe-image.mdwith{language}substitution, decorative-vs-content classification, refusal fallback (mark decorative + empty description), and the exact JSON output shape.executors/api.py:_user_promptnow splices anImages in this post:section between the post body and the links/article block when the item has content-bearing described photos; decoratives are filtered at the seam.topic_synth.py:_user_promptreceives the flat list of content image descriptions across every post in the topic, after the per-post summaries.xbrain diffgrows adescribedcolumn and adelta_describedfield so axbrain media → xbrain describecycle is visible in one report.[describe].model(defaultclaude-sonnet-4-6) and[describe].version(defaultv1).[describe].versionis the rubric-evolution lever: bumping it invalidates persisted entries automatically.Acceptance criteria
xbrain describeon the full corpus completes within budget — verified by spec choice (Sonnet + 5/call ≈ $3-5 on 2k images, well under $20).is_decorativeflag and (if not decorative) a description — variant transition test.xbrain describeis a no-op for already-described photos (skipped on summary) — idempotency test.--forcere-describes everything — force test.xbrain enrichafterxbrain describeproduces user-prompt strings that include theImages in this post:section —test_user_prompt_includes_content_bearing_image_descriptions.test_user_prompt_excludes_decorative_image_descriptions.xbrain topicsafterxbrain describeincludes image descriptions when synthesizing topic-page overviews —test_user_prompt_includes_image_descriptions_when_provided+test_build_topic_inputs_collects_content_image_descriptions.RuntimeErrorraise test + CLI exit-1 test.is_decorative=true+ empty description on refusal;_apply_judgmentenforces the empty-description contract regardless of what the model wrote.output_language—test_describe_all_substitutes_output_language_in_system_prompt+test_describe_all_records_language_on_described_variant.Not exercised in this PR (out of scope, deferred):
xbrain describeagainst the Anthropic API on the real corpus."Deviations from the plan
--limit. The current implementation uses the standard streaming API for all runs and exposes--batch-sizeas the operator-facing knob; Batch API support is a future iteration. Spec ≤$20 budget holds with streaming at the corpus size we have.--use-batch-apiflag; not implemented for the same reason. The current--batch-sizeflag covers the operator-facing dimension; a future flag can be added alongside without breaking the contract.Test plan
uv run pytestreports 494 passing (was 431 before; +63 new tests).bash scripts/check.shreports ALL CRITICAL CHECKS PASSED. Radon WARN comes from the 3 pre-existing C-grade functions (generate._render_index,validate.validate_judgment,archive._archive_tweet_to_item); no new C-grade functions introduced.uv run xbrain describe --helpshows the new command with--force,--limit,--items,--model,--batch-size,--verbose.uv run python -c "from xbrain.describe import describe_all; from xbrain.models import MediaPhotoDescribed; print('OK')".uv run xbrain describe --limit 0against a populateddata/items.jsonproduces apre-describe-*snapshot underdata/snapshots/and exits 0.uv run xbrain diff <snapshot>against a snapshot taken before a describe run shows non-zerodelta_describedin the MEDIA block.Dependencies
MediaPhotoDownloadedbeing the bytes-on-disk state.media.pyfrom C-grade complexity; this PR maintains that contract (no new C functions inmedia.pyor in the newdescribe.py).🤖 Generated with Claude Code