Spatial grounding for image-derived concepts (bbox on the Instance evidence node)

## Idea

Attach optional **normalized bounding boxes** to image-derived concepts so each identified thing can be localized in (and lazily cropped from) the source image, while preserving the existing concept/evidence model.

## Provenance of this idea

Came from a sidebar brainstorm with Claude Desktop (which had **no knowledge of this platform's schema** — it proposed a new `Observation` node + `DEPICTS` edge). Its *instincts* are sound but must be translated onto our actual primitives, not its invented ones. This issue is that translation.

## Grounding in the ACTUAL schema (verified, not assumed)

We already have the node the sidebar was reaching for:

```
(:Concept)-[:EVIDENCED_BY]->(:Instance)-[:FROM_SOURCE]->(:Source)
```

- The **`Instance`** node is the reified per-source evidence — the sidebar's "Observation." Verified keys today: `quote`, `instance_id`, `created_at_event_id`. For an image, `quote` is a prose snippet of the `describe_image` literal description (ADR-057).
- `Source` carries `content_type='image'` and the Garage `storage_key` (original image already retained).
- A concept that recurs across many images already stacks evidence as multiple `Instance`s on one shared `Concept` via embedding-based matching (candidate retrieval #453 + merge threshold). **Entity resolution already exists** — we do NOT need the sidebar's manual merge model.

## Proposal

1. **bbox lives on the `Instance`, never on the `Concept`.** The Instance is already 1:1 with (concept-mention, source-image), so it is the correct home. A concept appearing twice in one image → two Instances → two bboxes (already supported). Add optional properties:
   - `bbox`: normalized `[x, y, w, h]` in `[0,1]` (resolution-independent — survives thumbnails/re-exports; dodges Claude's resize/rescale trap).
   - `bbox_source`: `claude` | `detector` | `manual` (provenance + UI styling + upgrade path).
   - `bbox_confidence`: `0..1`.
2. **Single detection pass at extraction.** Have the vision step return per-concept normalized boxes in the SAME response that yields concepts/quotes (holistic layout reasoning beats N per-entity queries), attaching a bbox to each Instance it creates. Keep this OPTIONAL and isolated so shaky coordinates never contaminate the (reliable) concept/quote extraction.
3. **Lazy cropping, never persisted crops.** Original stays in Garage; crop on demand from `bbox` (+5–8% padding, since Claude boxes run loose). Refine a box later (re-run / detector / manual) and every crop regenerates — no stale assets.
4. **Refinement migration path, no schema churn:** ship with `bbox_source='claude'` (approximate, soft-highlight overlays); optionally upgrade specific boxes via a grounding detector (Grounding DINO / OWLv2 / SAM) → `detector`; manual correction → `manual`.

## Known caveats (set expectations)

- Claude localization is **approximate and run-to-run variable** (Roboflow-style testing); it's strong at *what + roughly where*, weak at pixel-precise boxes. Treat overlays as soft regions; pair with a real detector if tight crops are needed downstream.
- **Granularity**: scene-level concepts ("Historic Church Interior" ≈ whole frame) vs object-level ("Boy at Pulpit" = tight region) and nested/overlapping boxes. Consider a scale/granularity hint on the Instance so the UI/cropper distinguishes them.

## Surfaces this touches

- Vision extraction (`describe_image` path / ingestion worker) — emit optional boxes.
- Web Catalog/graph UI — soft overlays on the source image.
- FUSE — could expose a cropped-region view per concept.

## Scope

Design first (likely a small ADR for the Instance schema addition + the optional detection pass). Not started. Raised by @aaronsb from a sidebar that did not know the platform — the value here is mapping the idea onto the real `Instance` evidence model. Refs ADR-057 (vision), ADR-803 (modality embedding), #453 (global candidate retrieval / matching).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spatial grounding for image-derived concepts (bbox on the Instance evidence node) #461

Idea

Provenance of this idea

Grounding in the ACTUAL schema (verified, not assumed)

Proposal

Known caveats (set expectations)

Surfaces this touches

Scope

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Spatial grounding for image-derived concepts (bbox on the Instance evidence node) #461

Description

Idea

Provenance of this idea

Grounding in the ACTUAL schema (verified, not assumed)

Proposal

Known caveats (set expectations)

Surfaces this touches

Scope

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions