MDEV-39858: Reloading COSINE metric index from disk degrades search r…#5184
Conversation
…ecall due to abs2 quantization noise When a vector is created in-memory using FVector::create() during normal inserts, its squared magnitude (abs2) under the COSINE metric is hardcoded to 0.5f. However, when the index is reloaded from disk (after a server restart, FLUSH TABLES, or ALTER TABLE), the index uses FVectorNode::load_from_record(). This method reads the stored scale and quantized int16 coordinates from the database record, and runs postprocess(). Inside postprocess(), abs2 is dynamically recomputed using floating-point math: abs2 = subabs2 + scale * scale * dot_product(d, d, vec_len) / 2; Because the coordinates stored on disk are quantized int16 values, this recalculation introduces rounding noise. This affects high dimensions datasets, and it is increasing as M increases. Added hardcoded abs2=0.5 to FVectorNode::load_from_record
There was a problem hiding this comment.
Code Review
This pull request updates FVectorNode::load_from_record in sql/vector_mhnsw.cc to set vec_ptr->abs2 to 0.5f when the metric is COSINE. The reviewer pointed out that calling postprocess right before this assignment is redundant and wasteful because postprocess already computes abs2. They suggested skipping the postprocess call entirely when the metric is COSINE to avoid unnecessary computation.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| if (ctx->metric == COSINE) | ||
| vec_ptr->abs2= 0.5f; |
There was a problem hiding this comment.
Calling vec_ptr->postprocess(ctx->vec_len) on line 885 dynamically recomputes abs2 using floating-point math (including an expensive dot product over the vector dimensions). Since this calculated value is immediately overwritten with 0.5f for the COSINE metric, the entire computation in postprocess is redundant and wasted.\n\nThis can cause a significant performance bottleneck when reloading large, high-dimensional indexes from disk.\n\nIf postprocess only computes abs2, consider skipping it entirely for the COSINE metric:\n\ncpp\n if (ctx->metric == COSINE)\n vec_ptr->abs2= 0.5f;\n else\n vec_ptr->postprocess(ctx->vec_len);\n
When a vector is created in-memory using FVector::create() during normal inserts, its squared magnitude (abs2) under the COSINE metric is hardcoded to 0.5f.
However, when the index is reloaded from disk (after a server restart, FLUSH TABLES, or ALTER TABLE), the index uses FVectorNode::load_from_record(). This method reads the stored scale and quantized int16 coordinates from the database record, and runs postprocess(). Inside postprocess(), abs2 is dynamically recomputed using floating-point math: abs2 = subabs2 + scale * scale * dot_product(d, d, vec_len) / 2;
Because the coordinates stored on disk are quantized int16 values, this recalculation introduces rounding noise.
This affects high dimensions datasets, and it is increasing as M increases.
Added hardcoded abs2=0.5 to FVectorNode::load_from_record