ChromaCollection.search (vd/backends/chroma.py) converts every distance to a score with:
score = 1.0 / (1.0 + distance)
regardless of the collection's distance metric. The inline comment even notes that for cosine ChromaDB returns 1 - cosine — so for a cosine collection the reported score is 1 / (2 - cosine_sim), a monotonic but non-standard, non-interpretable number. The code never reads the collection's actual configured metric (hnsw:space).
Impact: ranking is preserved (the transform is monotonic), so results are not mis-ordered. But:
- Scores are not comparable across backends — the
memory backend returns cosine similarity directly in [-1, 1].
- Scores are not interpretable — a "0.5" means nothing consistent.
vd's own reciprocal_rank_fusion / deduplicate_results / multi_query_search helpers and ef's SearchHit.score all consume these values.
Proposal:
ChromaCollection.search(vd/backends/chroma.py) converts every distance to a score with:regardless of the collection's distance metric. The inline comment even notes that for cosine ChromaDB returns
1 - cosine— so for a cosine collection the reported score is1 / (2 - cosine_sim), a monotonic but non-standard, non-interpretable number. The code never reads the collection's actual configured metric (hnsw:space).Impact: ranking is preserved (the transform is monotonic), so results are not mis-ordered. But:
memorybackend returns cosine similarity directly in[-1, 1].vd's ownreciprocal_rank_fusion/deduplicate_results/multi_query_searchhelpers andef'sSearchHit.scoreall consume these values.Proposal:
[-1, 1], or a documentedsimilarity = 1 - distancefor cosine /-distancefor L2).Collection.searchcontract inbase.py, so every backend (memory, chroma, future) agrees on whatscoremeans.