Evaluate unifying ir storage behind vd.Collection (from ir_04) — DEFERRED

Refs #1, #12. Records the assessment of the ir_04 idea "unify ir storage behind `vd.Collection` to reuse `vd.hybrid_search` wholesale." **Decision: defer** (not skip). Verified against the actual `vd` and `ir` source.

## Why defer — the headline payoff does not exist
The stated motivation was partly to fix the RRF score-magnitude loss `ir_07` flagged as weakening hybrid abstention. But `vd.hybrid_search`'s fallback path (`vd/search.py` `_rrf_fuse`) uses the **same rank-based formula** `1/(k+rank)` as ir's own `_rrf_fuse` (`ir/retrieve.py`). `ir_07` is explicit that the magnitude loss is structural to RRF, not an ir implementation defect — so delegating would reproduce the identical weakness. `vd`'s native-hybrid path only triggers for server backends ir doesn't use; ir's tiny brute-force corpora always hit the same RRF fallback. **Zero payoff on the stated reason.**

## Why defer — "reduce surface" is also weak
`vd.Collection` is a flat `MutableMapping[str, Document]` + search. It does **not** model the concerns that make up ir's retrieval surface:
- per-artifact surface collapse (`best_per_artifact`) — ir indexes multiple surfaces per artifact and dedupes to best-per-artifact; vd has no such notion;
- the calibrated-abstention score contract — `SearchHit.score` feeds `abs_threshold` / the conservative selector's `min_score` floor, and `ir_07`'s calibration depends on the exact per-mode score semantics;
- the dol 5-view persistence (meta/vectors/ledger/config/calibration);
- ledger-driven incremental/idempotent build.

Wrapping `CorpusStore` as a `vd.Collection` would keep all of this and add an adapter layer — a **net surface increase**.

## Why defer — migration cost is real and unrewarded
There is no dol/Files-backed `vd` backend (vd's persistent backends are faiss/qdrant/lancedb/sqlite_vec). Adopting existing persisted corpora (`~/.local/share/ir/corpora/<name>`) would require either re-indexing into an external backend (new heavy dep, loses the exact brute-force guarantee) or **building a new dol-backed vd Collection adapter — i.e. building vd infra to host ir, the opposite of "reuse wholesale."** At ir's scale (≤~250 artifacts) brute-force dense is exact and instant; there is no performance problem to solve.

ir already reuses `vd` where reuse is clean: `vd.BM25Index` (lexical), `vd.reciprocal_rank_fusion` (fusion), `vd.filters.matches_filter` (hard filter). **That is the right reuse boundary.**

## The one genuinely valuable thread (split out)
The sub-goal that *does* address `ir_07`'s abstention concern — a **magnitude-preserving hybrid fusion** (normalized dense+lexical score blend instead of pure RRF) — is independent of the storage refactor and is being prototyped in `ir.retrieve` directly. See #TBD (separate issue/PR).

## Revisit if
ir's corpus scale grows past brute-force comfort, **or** `vd` ships a dol-backed `Collection` that preserves dense cosine magnitude in hybrid fusion.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Evaluate unifying ir storage behind vd.Collection (from ir_04) — DEFERRED #28

Why defer — the headline payoff does not exist

Why defer — "reduce surface" is also weak

Why defer — migration cost is real and unrewarded

The one genuinely valuable thread (split out)

Revisit if

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Evaluate unifying ir storage behind vd.Collection (from ir_04) — DEFERRED #28

Description

Why defer — the headline payoff does not exist

Why defer — "reduce surface" is also weak

Why defer — migration cost is real and unrewarded

The one genuinely valuable thread (split out)

Revisit if

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions