You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Attach optional normalized bounding boxes to image-derived concepts so each identified thing can be localized in (and lazily cropped from) the source image, while preserving the existing concept/evidence model.
Provenance of this idea
Came from a sidebar brainstorm with Claude Desktop (which had no knowledge of this platform's schema — it proposed a new Observation node + DEPICTS edge). Its instincts are sound but must be translated onto our actual primitives, not its invented ones. This issue is that translation.
Grounding in the ACTUAL schema (verified, not assumed)
We already have the node the sidebar was reaching for:
The Instance node is the reified per-source evidence — the sidebar's "Observation." Verified keys today: quote, instance_id, created_at_event_id. For an image, quote is a prose snippet of the describe_image literal description (ADR-057).
Source carries content_type='image' and the Garage storage_key (original image already retained).
bbox lives on the Instance, never on the Concept. The Instance is already 1:1 with (concept-mention, source-image), so it is the correct home. A concept appearing twice in one image → two Instances → two bboxes (already supported). Add optional properties:
bbox: normalized [x, y, w, h] in [0,1] (resolution-independent — survives thumbnails/re-exports; dodges Claude's resize/rescale trap).
Single detection pass at extraction. Have the vision step return per-concept normalized boxes in the SAME response that yields concepts/quotes (holistic layout reasoning beats N per-entity queries), attaching a bbox to each Instance it creates. Keep this OPTIONAL and isolated so shaky coordinates never contaminate the (reliable) concept/quote extraction.
Lazy cropping, never persisted crops. Original stays in Garage; crop on demand from bbox (+5–8% padding, since Claude boxes run loose). Refine a box later (re-run / detector / manual) and every crop regenerates — no stale assets.
Refinement migration path, no schema churn: ship with bbox_source='claude' (approximate, soft-highlight overlays); optionally upgrade specific boxes via a grounding detector (Grounding DINO / OWLv2 / SAM) → detector; manual correction → manual.
Known caveats (set expectations)
Claude localization is approximate and run-to-run variable (Roboflow-style testing); it's strong at what + roughly where, weak at pixel-precise boxes. Treat overlays as soft regions; pair with a real detector if tight crops are needed downstream.
Granularity: scene-level concepts ("Historic Church Interior" ≈ whole frame) vs object-level ("Boy at Pulpit" = tight region) and nested/overlapping boxes. Consider a scale/granularity hint on the Instance so the UI/cropper distinguishes them.
Web Catalog/graph UI — soft overlays on the source image.
FUSE — could expose a cropped-region view per concept.
Scope
Design first (likely a small ADR for the Instance schema addition + the optional detection pass). Not started. Raised by @aaronsb from a sidebar that did not know the platform — the value here is mapping the idea onto the real Instance evidence model. Refs ADR-057 (vision), ADR-803 (modality embedding), #453 (global candidate retrieval / matching).
Idea
Attach optional normalized bounding boxes to image-derived concepts so each identified thing can be localized in (and lazily cropped from) the source image, while preserving the existing concept/evidence model.
Provenance of this idea
Came from a sidebar brainstorm with Claude Desktop (which had no knowledge of this platform's schema — it proposed a new
Observationnode +DEPICTSedge). Its instincts are sound but must be translated onto our actual primitives, not its invented ones. This issue is that translation.Grounding in the ACTUAL schema (verified, not assumed)
We already have the node the sidebar was reaching for:
Instancenode is the reified per-source evidence — the sidebar's "Observation." Verified keys today:quote,instance_id,created_at_event_id. For an image,quoteis a prose snippet of thedescribe_imageliteral description (ADR-057).Sourcecarriescontent_type='image'and the Garagestorage_key(original image already retained).Instances on one sharedConceptvia embedding-based matching (candidate retrieval Extraction context: global candidate-concept retrieval (replace cap-50 ontology dump) #453 + merge threshold). Entity resolution already exists — we do NOT need the sidebar's manual merge model.Proposal
Instance, never on theConcept. The Instance is already 1:1 with (concept-mention, source-image), so it is the correct home. A concept appearing twice in one image → two Instances → two bboxes (already supported). Add optional properties:bbox: normalized[x, y, w, h]in[0,1](resolution-independent — survives thumbnails/re-exports; dodges Claude's resize/rescale trap).bbox_source:claude|detector|manual(provenance + UI styling + upgrade path).bbox_confidence:0..1.bbox(+5–8% padding, since Claude boxes run loose). Refine a box later (re-run / detector / manual) and every crop regenerates — no stale assets.bbox_source='claude'(approximate, soft-highlight overlays); optionally upgrade specific boxes via a grounding detector (Grounding DINO / OWLv2 / SAM) →detector; manual correction →manual.Known caveats (set expectations)
Surfaces this touches
describe_imagepath / ingestion worker) — emit optional boxes.Scope
Design first (likely a small ADR for the Instance schema addition + the optional detection pass). Not started. Raised by @aaronsb from a sidebar that did not know the platform — the value here is mapping the idea onto the real
Instanceevidence model. Refs ADR-057 (vision), ADR-803 (modality embedding), #453 (global candidate retrieval / matching).