Skip to content

Low recall for IVF_SQ / IVF_HNSW_SQ with metric="dot" on MSMARCO #7352

Description

@cwj0bzxg

Observed unexpectedly low recall when using SQ-based indexes with dot‑product search on the MSMARCO 1M dataset.

Environment

  • Lance tag: v8.0.0-beta.18 (commit 909dea18b1)
  • Python package: 8.0.0-beta.18
  • Dataset: MSMARCO Web Search 1M – 1M base vectors, 9,376 queries, dim 768
  • Metric: dot; ground‑truth: msmarco-1M-gt100; evaluated metric: recall@10

Index Configurations

  • IVF_HNSW_SQ: num_partitions=1, num_bits=8, m=16, ef_construction=200, nprobes=1, ef ∈ {20,40,80,160,320,640}
  • IVF_SQ: num_partitions=1024, num_bits=8, nprobes ∈ {16,32,64,96,128}
  • IVF_HNSW_FLAT (baseline): num_partitions=1, m=16, ef_construction=200, nprobes=1, ef ∈ {20,40,80,160,320,640}

Results

Index nprobes ef recall@10
IVF_HNSW_SQ 1 20 0.0250
IVF_HNSW_SQ 1 40 0.0370
IVF_HNSW_SQ 1 80 0.0490
IVF_HNSW_SQ 1 160 0.0577
IVF_HNSW_SQ 1 320 0.0641
IVF_HNSW_SQ 1 640 0.0684
IVF_SQ 16 N/A 0.0918
IVF_SQ 32 N/A 0.0891
IVF_SQ 64 N/A 0.0862
IVF_SQ 96 N/A 0.0841
IVF_SQ 128 N/A 0.0827

Baseline (IVF_HNSW_FLAT)

Index nprobes ef recall@10
IVF_HNSW_FLAT 1 20 0.5179
IVF_HNSW_FLAT 1 40 0.6522
IVF_HNSW_FLAT 1 80 0.7566
IVF_HNSW_FLAT 1 160 0.8382
IVF_HNSW_FLAT 1 320 0.8983
IVF_HNSW_FLAT 1 640 0.9377

The gap is substantial: the flat baseline reaches 0.9377 recall@10 at ef=640, whereas IVF_HNSW_SQ peaks at 0.0684 and IVF_SQ remains around 0.08–0.09.

Possible Cause

I suspect the dot-product distance computation for SQ is incorrect.

SQ encodes each floating-point value with a lower-bound offset:

code = (value - lower_bound) / scale

So the approximate dequantized value is:

value ≈ lower_bound + scale * code

Let cx be the SQ code of the data vector x, and cq be the SQ code of the query vector q.

The current SQ dot path appears to compute dot product directly in code space, roughly as:

dot(x, q) = scale² * sum(cx_i * cq_i)

However, the dot product between the dequantized vectors should be computed as:

dot(x, q)
≈ sum((lower_bound + scale * cx_i) * (lower_bound + scale * cq_i))

Expanding this gives:

dot(x, q)
≈ scale² * sum(cx_i * cq_i)
 + scale * lower_bound * sum(cx_i)
 + scale * lower_bound * sum(cq_i)
 + dim * lower_bound²

So if the implementation only uses the code-space dot term, it misses the offset-related terms. This can change the candidate ordering significantly, especially for dot-product search, and may explain the very low recall observed for IVF_SQ and IVF_HNSW_SQ with metric="dot".

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions