VGonPa · VGonPa · May 24, 2026 · May 24, 2026 · May 24, 2026 · May 24, 2026
diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md
@@ -266,6 +266,35 @@ The numbered stages above are summarised; the sections below cover each one in d
 
 **Snapshot trigger.** `xbrain media` always snapshots `data/` first (label `pre-media`), mirroring the destructive-op recovery boundary. The snapshot covers `items.json` / `state.json` / `vocab.yaml` / `topics.json` only — the binary photo bytes under `data/media/` are NOT included; re-downloading via `xbrain media` is the recovery path.
 
+### describe
+
+**What it does.** Sends every downloaded photo to a Claude vision model, asks for a 1-3 sentence prose description plus a `is_decorative` classification, and persists the prose on the entry. The entry transitions from `MediaPhotoDownloaded` to `MediaPhotoDescribed` (a new variant on the `MediaEntry` union). Decorative photos (avatars, reaction memes, abstract backgrounds) are classified as such with an empty description so downstream prompts can filter them out without re-classifying.
+
+**Reads.** `data/items.json` + `data/media/<id>/<n>.<ext>` (the bytes the downloader wrote).
+
+**Writes.** `data/items.json` — each described photo entry carries `is_decorative` + `description` + `description_lang` + `description_version` + `described_at`. No new on-disk binary state; the bytes from the prior `MediaPhotoDownloaded` are inherited verbatim.
+
+**State machine.** Each `xbrain describe` run advances eligible photo entries:
+- `Downloaded` → `Described` (description on the entry, bytes unchanged).
+- `Described` (stale version OR stale language) → `Described` (current version + current language), automatically.
+- `Described` (current version + current language) → no-op (skipped) unless `--force`.
+
+Eligibility ignores `Pending` / `Failed` / `VideoPending`: describe only runs on photos with bytes on disk. The description-version tag is the rubric-evolution lever: bumping `[describe].version` in `config.toml` invalidates persisted entries so the next run re-describes them without `--force`. The `description_lang` check is the mixed-vault guard: switching `[paths].output_language` from Spanish → English (or back) marks every previously-described entry stale so the enrich prompt never splices the wrong-language prose into a new vault.
+
+**Batching.** Default batch size is 5 images per API call (the spec's quality / cost sweet spot — ~12-15 % token saving vs per-image, modest added complexity). Override with `--batch-size N`.
+
+**Refusals.** Vision refusals (faces, NSFW) are NOT a hard failure: the entry is persisted as decorative with an empty description, and the run continues. The same `is_decorative` flag downstream consumers already use for "no topic signal" handles the refusal uniformly.
+
+**Failure isolation.** Per-batch error isolation: one failing API call does not abort the run. A total-failure run (every batch errored) raises `RuntimeError` so the CLI surfaces non-zero exit. The orchestrator's `on_progress` callback writes `items.json` between batches so Ctrl-C mid-run leaves the store coherent — same recovery contract as `media`.
+
+**Snapshot trigger.** `xbrain describe` always snapshots `data/` first (label `pre-describe`), mirroring `media`'s recovery boundary. A botched run — wrong model, runaway prompt — can be undone with `xbrain snapshot restore`.
+
+**Feeds the LLM stages.** Once described, the prose is consumed automatically:
+- `xbrain enrich` (in `executors/api.py:_user_prompt`) splices an `Images in this post:` section between the post body and the links/article block when the item has content-bearing described photos. Decoratives are filtered.
+- `xbrain topics` (in `topic_synth.py:_user_prompt`) appends the flat list of content-bearing image descriptions across every post in a topic, after the per-post summaries.
+
+This is how a tweet that is mostly a screenshot of a paper becomes searchable by what the screenshot was actually about.
+
 ### vocab
 
 **What it does.** Induces a closed taxonomy of ~30-45 topics from the whole corpus. Map step: chunks the corpus, asks an LLM to propose candidate topics per chunk. Reduce step: asks the LLM to consolidate the union of candidates down to `vocab.target_count` topics. Always includes a `misc` topic for posts with no thematic core.
@@ -333,7 +362,7 @@ Everything XBrain knows lives in four files inside `data/` (gitignored). They ar
 
 | File | Format | What it is | Mutated by |
 |------|--------|------------|------------|
-| `items.json` | JSON array of `Item` | The source of truth — every post XBrain has ever seen, with all fetched content and enrichment | `extract`, `fetch`, `enrich`, `media` |
+| `items.json` | JSON array of `Item` | The source of truth — every post XBrain has ever seen, with all fetched content, enrichment, and per-photo vision descriptions | `extract`, `fetch`, `enrich`, `media`, `describe` |
 | `state.json` | JSON | Extractor cursors (`last_seen_id`, `last_run`) per source, archive-import marker | `extract`, `import-archive` |
 | `vocab.yaml` | YAML list of `Topic` | The controlled topic taxonomy — closed list of slugs + descriptions | `vocab` |
 | `topics.json` | JSON dict of `TopicPage` | The synthesized topic-page overviews and notes, keyed by slug | `topics` |
@@ -355,6 +384,7 @@ The LLM-driven stages (`vocab`, `enrich`, `topics`) do not have their instructio
 | `rubric-topics.md` | `enrich` | Assign one `primary_topic` + 0-3 secondaries from the closed vocab. Never invent slugs |
 | `rubric-summary.md` | `enrich` | Write a 1-3 sentence summary, faithful to the post and the fetched article, no hallucination |
 | `rubric-topic-page.md` | `topics` | Synthesize 1-3 paragraphs of plain prose + up to 15 short notes per topic, zero wikilinks |
+| `rubric-describe-image.md` | `describe` | Classify each photo as decorative vs content-bearing and describe content-bearing ones in 1-3 sentences. Refusals fall through as decorative with empty description |
 
 **Why a separate file per rubric.** Changing how XBrain summarizes posts is editing one markdown file, not chasing a string through the codebase. The rubric is the *contract* between code and LLM; the code only handles structure, transport and validation.
 
@@ -438,7 +468,7 @@ These are the rules the rest of the architecture rests on. Breaking any of them
 7. **Operation names, not query ids.** The extractor anchors to X GraphQL operation names because X rotates the ids. Anything that hardcodes an id will break.
 8. **Destructive ops are reversible.** Every command that overwrites a `data/` artifact (`vocab --regenerate`, `topics --resynth`, `fetch --force`) snapshots `data/` first to `data/snapshots/<ts>-pre-<command>/`. `xbrain snapshot restore <name>` is the recovery path. A snapshot failure aborts the destructive op.
 9. **Fetch records are tagged unions.** A `ContentSource` on `items.json` is either a `Success` (with required `text`) or a `Failure` (with required `failure_reason`). Mixed shapes are not representable — pydantic rejects them at construction, and mypy rejects them statically (via the `pydantic.mypy` plugin). Legacy records with `ok: bool` (pre-#20) are normalised on read by a `BeforeValidator` on the union, so existing `data/items.json` files keep working without a manual migration. The static contract is pinned by `tests/type_probes/illegal_states.py`.
-10. **Media variants are mutually exclusive states.** A `MediaEntry` on `items.json` is one of `MediaPhotoPending` / `MediaPhotoDownloaded` / `MediaPhotoFailed` / `MediaVideoPending`, discriminated by `kind`. State transitions happen only via `xbrain media`. Legacy records with the flat `{type, url}` shape are normalised on read by a `BeforeValidator` on the union — no manual migration needed. (See the `### media` section above for the retry contract and storage layout.)
+10. **Media variants are mutually exclusive states.** A `MediaEntry` on `items.json` is one of `MediaPhotoPending` / `MediaPhotoDownloaded` / `MediaPhotoFailed` / `MediaPhotoDescribed` / `MediaVideoPending`, discriminated by `kind`. The photo states form a linear pipeline: `Pending → Downloaded → Described` (with `Failed` as the off-ramp from `Pending`). State transitions happen only via `xbrain media` (advances `Pending` and retries `Failed`) and `xbrain describe` (advances `Downloaded` to `Described`). Legacy records with the flat `{type, url}` shape are normalised on read by a `BeforeValidator` on the union — no manual migration needed. (See the `### media` and `### describe` sections above for the per-stage contracts.)
 
 ---
 

diff --git a/README.md b/README.md
@@ -397,6 +397,8 @@ topic_style = "wikilink"                  # wikilink | hashtag (in-body Topics:
 | `[topics]` | `resynth_threshold` | `25` | Post growth that marks a topic overview stale. |
 | `[output]` | `language` | `English` | Output language for LLM summaries/overviews AND wiki section headers. `English` or `Spanish`. |
 | `[output]` | `topic_style` | `wikilink` | How the in-body `**Topics:**` line is rendered: `wikilink` (`[[slug]] · [[slug]]`) or `hashtag` (`#slug #slug`). Frontmatter `tags:` are unaffected. |
+| `[describe]` | `model` | `claude-sonnet-4-6` | Vision model for `xbrain describe`. Override per run with `--model`. |
+| `[describe]` | `version` | `v1` | Tag persisted on every described photo. Bumping invalidates existing descriptions so the next `xbrain describe` re-describes stale entries. |
 
 Switching `[output].language` after the corpus is already enriched is supported
 — but does not retroactively translate existing summaries. To convert the
@@ -517,6 +519,7 @@ uv run xbrain <command> [options]
 | `import-archive <zip>` | Backfill the full own-tweet history from the official X data archive. |
 | `fetch` | Download linked article content, expand threads, fetch linked X content. By default, items whose only previous failures were transient (`timeout`, `dns_error`) are re-fetched automatically; terminal failures (`not_found`, `paywall`, `forbidden`, `js_required`, `empty_content`) stay skipped until `--force`. `--force` re-fetches every external_article source regardless of state. |
 | `media` | Download X-post photos referenced in `Item.media` and render them inline in the wiki. `--force`, `--limit N`, `--items <a,b,c>`, `--verbose`. See [Local media storage](#local-media-storage). |
+| `describe` | Describe downloaded photos with a vision LLM (Claude Sonnet 4.6 by default) and feed the prose into `enrich` + `topics`. `--force`, `--limit N`, `--items <a,b,c>`, `--model`, `--batch-size`, `--verbose`. Idempotent — re-runs skip already-described photos unless `[describe].version` is bumped in `config.toml`. |
 | `vocab` | Induce the topic taxonomy. `--executor`, `--apply <file>`, `--regenerate`. |
 | `enrich` | Enrich items with a summary + topics. `--executor`, `--apply <file>`. |
 | `topics` | Synthesise topic pages. `--executor`, `--apply <file>`, `--resynth`. |
@@ -583,7 +586,31 @@ Failures are categorised on the item itself
   the next `xbrain media` run.
 
 Run `xbrain diff <snapshot>` after a media run to see how many photos
-moved from `pending` / `failed` into `downloaded`.
+moved from `pending` / `failed` into `downloaded` (or, after `xbrain
+describe`, into `described`).
+
+**Vision descriptions**
+
+Once `xbrain media` has the bytes on disk, `xbrain describe` runs every
+photo through Claude vision and stores a short prose description on
+the entry (transitioning `MediaPhotoDownloaded` → `MediaPhotoDescribed`).
+Descriptions are 1-3 sentences, faithful, in the configured
+`output_language`. Decorative photos (avatars, reaction memes,
+abstract backgrounds) are classified as such and persisted with an
+empty description so they introduce no topic noise downstream.
+
+`xbrain enrich` and `xbrain topics` consume the descriptions
+automatically: an item with content-bearing photos gets an
+`Images in this post:` block in the enrichment prompt; topic-page
+synthesis sees the flat list of content image descriptions across the
+topic's posts. This is how a tweet that is mostly a screenshot of a
+paper becomes searchable by what the screenshot was actually about.
+
+Describing the full corpus costs about $3-5 with the default model
+(Sonnet 4.6, 5 images per call). Bump `[describe].version` in
+`config.toml` to invalidate stored descriptions when you change the
+rubric — the next `xbrain describe` run re-describes stale entries
+automatically without `--force`.
 
 ---
 

diff --git a/config.toml.example b/config.toml.example
@@ -39,3 +39,14 @@ language = "English"
 #   "wikilink" - **Topics:** [[ai-coding]] · [[software-engineering]]  (default)
 #   "hashtag"  - **Topics:** #ai-coding #software-engineering
 topic_style = "wikilink"
+
+[describe]
+# Vision model used by `xbrain describe`. Sonnet 4.6 is the default —
+# the quality / cost sweet spot (~$3-5 for a 2k-image corpus). Override
+# per run with `--model` while iterating; the CLI flag wins.
+model = "claude-sonnet-4-6"
+# Description-version tag persisted on every described photo. Bumping
+# this value invalidates existing descriptions: the next `xbrain
+# describe` run re-describes stale entries automatically. Use it when
+# you change the describe-image rubric or expectations.
+version = "v1"
diff --git a/src/xbrain/cli.py b/src/xbrain/cli.py
@@ -14,6 +14,8 @@
 from xbrain import snapshot
 from xbrain.archive import parse_archive
 from xbrain.config import Config, load_config
+from xbrain.describe import describe_all as run_describe_all
+from xbrain.describe import emit_summary_line as describe_emit_summary_line
 from xbrain.diff import diff_snapshots, format_json, format_text
 from xbrain.enrich import apply_worksheet_judgments, enrich_with_executor, items_pending_enrichment
 from xbrain.executors.api import ApiExecutor
@@ -359,6 +361,132 @@ def media(
     _run_media(cfg, force=force, limit=limit, items_filter=items_filter, verbose=verbose)
 
 
+def _run_describe(
+    cfg: Config,
+    *,
+    force: bool,
+    limit: int | None,
+    items_filter: list[str] | None,
+    model: str,
+    batch_size: int,
+    verbose: bool,
+) -> None:
+    """Run the vision-describe orchestrator and persist after every batch.
+
+    Always snapshots `data/` first (the same recovery boundary as
+    `xbrain media`): a botched run — a wrong model, a runaway prompt
+    — can be undone with `xbrain snapshot restore`. Coherence on a
+    Ctrl-C mid-run is held by the outer `try/finally` below, which
+    saves the store unconditionally even when the orchestrator raises;
+    the `on_progress` callback is for incremental persistence between
+    batches on a clean run (so a long describe run never loses more
+    than one batch of work to a process death).
+    """
+    if items_filter:
+        target = set(items_filter)
+        store_ids = set(load_store(cfg.items_path))
+        missing = target - store_ids
+        if missing and not (target & store_ids):
+            typer.echo(
+                f"AVISO: --items {','.join(items_filter)} no coincide con ningún item "
+                f"del store ({len(store_ids)} items). El run será un no-op.",
+                err=True,
+            )
+    _auto_snapshot(cfg, "describe")
+    store = load_store(cfg.items_path)
+
+    def _persist() -> None:
+        save_store(store, cfg.items_path)
+
+    try:
+        report = run_describe_all(
+            store,
+            cfg.media_dir,
+            model=model,
+            output_language=cfg.output_language,
+            description_version=cfg.describe_version,
+            force=force,
+            limit=limit,
+            items_filter=items_filter,
+            batch_size=batch_size,
+            on_progress=_persist,
+        )
+    finally:
+        # Persist whatever transitioned, even if `describe_all` raised. A
+        # RuntimeError on total failure must not discard the per-photo
+        # MediaPhotoDescribed records that landed before the raise.
+        save_store(store, cfg.items_path)
+    describe_emit_summary_line(report)
+    typer.echo(
+        f"Describe: descritas {report.photos_described}, "
+        f"fallidas {report.photos_failed}, "
+        f"saltadas {report.photos_skipped_already_described}"
+    )
+    if verbose and report.per_item_failures:
+        typer.echo("Failed photos:", err=True)
+        for item_id, failures in sorted(report.per_item_failures.items()):
+            for url, error in failures:
+                typer.echo(f"  {item_id}  {url}  {error}", err=True)
+
+
+@app.command()
+@_handle_cli_errors
+def describe(
+    force: bool = typer.Option(
+        False,
+        "--force",
+        help="Re-describir todas las fotos, incluso las ya descritas en la versión actual.",
+    ),
+    limit: int | None = typer.Option(
+        None,
+        "--limit",
+        help="Máximo número de fotos a describir en esta ejecución.",
+    ),
+    items: str | None = typer.Option(
+        None,
+        "--items",
+        help="IDs de items separados por comas para limitar el alcance del run.",
+    ),
+    model: str | None = typer.Option(
+        None,
+        "--model",
+        help="Modelo de visión a usar. Si no se pasa, se usa el del config (`describe.model`).",
+    ),
+    batch_size: int = typer.Option(
+        5,
+        "--batch-size",
+        min=1,
+        help="Número de imágenes por llamada a la API. 5 es el sweet spot (12-15%% ahorro de tokens).",
+    ),
+    verbose: bool = typer.Option(
+        False,
+        "--verbose",
+        help="Imprime cada foto fallida (item_id, URL, error) al final del run.",
+    ),
+) -> None:
+    """Describe las fotos descargadas con un LLM de visión.
+
+    Solo describe fotos con bytes en disco (`MediaPhotoDownloaded`).
+    Las entradas ya descritas en la versión actual se saltan; bumpear
+    `[describe].version` en `config.toml` fuerza un re-describe
+    automático sin `--force`. Las descripciones se persisten en
+    `items.json` y son consumidas por `xbrain enrich` y `xbrain topics`
+    en las llamadas LLM subsiguientes.
+    """
+    cfg = _config()
+    items_filter = [s.strip() for s in items.split(",") if s.strip()] if items else None
+    chosen_model = model or cfg.describe_model
+    _run_describe(
+        cfg,
+        force=force,
+        limit=limit,
+        items_filter=items_filter,
+        model=chosen_model,
+        batch_size=batch_size,
+        verbose=verbose,
+    )
+
+
 @app.command()
 @_handle_cli_errors
 def enrich(

diff --git a/src/xbrain/config.py b/src/xbrain/config.py
@@ -29,6 +29,17 @@ class Config:
     topics_resynth_threshold: int
     output_language: str  # one of xbrain.i18n.SUPPORTED_LANGUAGES
     topic_style: str  # one of xbrain.config.SUPPORTED_TOPIC_STYLES
+    # `describe_model` defaults to Sonnet 4.6 — the spec settled on it as the
+    # quality / cost sweet spot for vision (~$3-5 for a 2k-image corpus).
+    # Override per run via `xbrain describe --model ...` when iterating on
+    # prompt or budget; the CLI flag wins over the config value.
+    describe_model: str
+    # `describe_version` tags every produced description so a prompt
+    # evolution can be rolled out incrementally: bumping the value here
+    # makes the next `xbrain describe` run re-describe stale entries
+    # automatically (no `--force` needed). The string is exact-match —
+    # there is no ordering relation, only equality.
+    describe_version: str
 
     @property
     def items_path(self) -> Path:
@@ -95,6 +106,7 @@ def load_config(repo_root: Path) -> Config:
             f"config.toml: [output].topic_style must be one of "
             f"{list(SUPPORTED_TOPIC_STYLES)}, got {topic_style!r}"
         )
+    describe = settings.get("describe", {})
     return Config(
         repo_root=repo_root,
         vault=vault,
@@ -107,4 +119,6 @@ def load_config(repo_root: Path) -> Config:
         topics_resynth_threshold=resynth_threshold,
         output_language=output_language,
         topic_style=topic_style,
+        describe_model=describe.get("model", "claude-sonnet-4-6"),
+        describe_version=describe.get("version", "v1"),
     )