Skip to content

Chroma backend search score is metric-blind and not comparable across backends #9

Description

@thorwhalen

ChromaCollection.search (vd/backends/chroma.py) converts every distance to a score with:

score = 1.0 / (1.0 + distance)

regardless of the collection's distance metric. The inline comment even notes that for cosine ChromaDB returns 1 - cosine — so for a cosine collection the reported score is 1 / (2 - cosine_sim), a monotonic but non-standard, non-interpretable number. The code never reads the collection's actual configured metric (hnsw:space).

Impact: ranking is preserved (the transform is monotonic), so results are not mis-ordered. But:

  • Scores are not comparable across backends — the memory backend returns cosine similarity directly in [-1, 1].
  • Scores are not interpretable — a "0.5" means nothing consistent.
  • vd's own reciprocal_rank_fusion / deduplicate_results / multi_query_search helpers and ef's SearchHit.score all consume these values.

Proposal:

  • Read the collection's configured metric and convert to a single documented canonical score (suggest: cosine similarity in [-1, 1], or a documented similarity = 1 - distance for cosine / -distance for L2).
  • Document the score semantics in the Collection.search contract in base.py, so every backend (memory, chroma, future) agrees on what score means.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions