feat: add euclidean one2many by richyreachy · Pull Request #188 · alibaba/zvec

richyreachy · 2026-03-01T14:47:15Z

add euclidean one2many

Greptile Summary

This PR introduces Euclidean one-to-many batch distance kernels (EuclideanDistanceBatch and SquaredEuclideanDistanceBatch) for FP32 and FP16 across AVX2, AVX512F, and AVX512FP16 ISA tiers, wires them into EuclideanMetric::batch_distance() and upgrades SquaredEuclideanMetric from BatchSize=1 to BatchSize=12. Several correctness issues from prior review rounds have been addressed (masked-tail formula, swapped variable names, UB in shift expression), though a few regressions introduced alongside the new feature — removal of HammingMetric::batch_distance(), removal of DT_BINARY32/DT_BINARY64 from SquaredEuclideanMetric, and two disabled tests — remain unresolved.

New issue — wrong prefetch target in fallback: compute_one_to_many_squared_euclidean_fallback calls ailego_prefetch(&prefetch_ptrs[j]), which prefetches the stack address of the pointer array element rather than the vector data. Every SIMD path uses ailego_prefetch(prefetch_ptrs[i] + dim) for the actual data address; the fallback should follow the same pattern.
Dead code in distance_batch_math.h: sum4() and sum_top_bottom_avx() are defined and included in two translation units but are never called; the implementations use HorizontalAdd_FP32_V256/HorizontalAdd_FP32_V512 from matrix_utility.i instead.
Prior unresolved regressions (Hamming batch_distance() removal, binary-type cases removed from SquaredEuclideanMetric, tests silently disabled, off-by-one in euclidean_distance_batch_impl_fp16_avx512.cc) were flagged in earlier review rounds and are still present.

Confidence Score: 2/5

Not safe to merge — the new fallback path has an incorrect prefetch, and previously flagged regressions (null batch_distance() for Hamming, disabled tests masking the regression) remain unaddressed.
Several correctness issues flagged in earlier review rounds (Hamming batch_distance removal, DT_BINARY32/64 removal, off-by-one in AVX512 FP16 partial block) are still present. A new bug introduced in this version — prefetching the wrong address in the scalar fallback — makes the fallback's prefetch hint ineffective. Two tests covering the affected code paths are silently disabled with no explanation.
src/ailego/math_batch/euclidean_distance_batch.h (wrong prefetch), src/core/metric/hamming_metric.cc and src/core/metric/euclidean_metric.cc (removed binary-type support), tests/core/algorithm/hnsw/hnsw_streamer_test.cc (disabled tests).

Important Files Changed

Filename	Overview
src/ailego/math_batch/distance_batch_math.h	New helper file providing `sum4()` and `sum_top_bottom_avx()` but neither function is actually called by any file that includes this header — dead code.
src/ailego/math_batch/euclidean_distance_batch.h	New core dispatch header defining `SquaredEuclideanDistanceBatch` and `EuclideanDistanceBatch` templates; contains a bug where `ailego_prefetch(&prefetch_ptrs[j])` prefetches the array's stack address instead of the pointed-to data.
src/ailego/math_batch/euclidean_distance_batch_impl_fp16_avx512.cc	AVX512F FP16 kernel; 16-element partial block formula is now correct; however the 8-element partial-block guard still uses strict `<` (off-by-one, previously flagged).
src/core/metric/euclidean_metric.cc	Adds `EuclideanMetric::batch_distance()` with BatchSize=12 and bumps `SquaredEuclideanMetric` from BatchSize=1 to 12; removes `DT_BINARY32`/`DT_BINARY64` cases from `SquaredEuclideanMetric` (previously flagged regression).
src/core/metric/hamming_metric.cc	Removes `HammingMetric::batch_distance()` override entirely; callers will now silently receive `nullptr` from the base class (previously flagged regression).
tests/core/algorithm/hnsw/hnsw_streamer_test.cc	`TestBinaryConverter` and `TestBasicRefiner` silently disabled with `#if 0` and no explanation; masks the regression caused by removing Hamming batch-distance support (previously flagged).

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["BaseDistance::ComputeBatch()"] --> B{DistanceType?}
    B -- EuclideanDistanceMatrix --> C["EuclideanDistanceBatch::ComputeBatch()"]
    B -- SquaredEuclideanDistanceMatrix --> D["SquaredEuclideanDistanceBatch::ComputeBatch()"]
    B -- Other --> E["_ComputeBatch() (existing)"]

    C --> D
    C --> F["sqrt(results[i]) ∀ i"]

    D --> G{num_vecs loop\nbatches of BatchSize}
    G --> H["SquaredEuclideanDistanceBatchImpl\n::compute_one_to_many()"]
    G --> I["Tail: BatchSize=1 per vector"]

    H --> J{CPU dispatch}
    J -- AVX512FP16 --> K["avx512fp16_fp16 kernel"]
    J -- AVX512F --> L["avx512f_fp16 / fp32 kernel"]
    J -- AVX2 --> M["avx2_fp16 / fp32 kernel"]
    J -- Fallback --> N["compute_one_to_many\n_squared_euclidean_fallback()\n⚠️ wrong prefetch address"]

Comments Outside Diff (1)

tests/core/algorithm/hnsw/hnsw_streamer_test.cc, line 3500 (link)

Tests disabled with #if 0

TestBinaryConverter and TestBasicRefiner are both silenced with #if 0 in this PR. Permanently disabling tests with preprocessor guards makes failures invisible and can hide regressions. If these tests are failing due to the binary-type removals in hamming_metric.cc / euclidean_metric.cc, the root cause should be fixed rather than the tests suppressed. If the tests are being temporarily skipped for another reason, using DISABLED_ as a Google Test prefix is the conventional way to do this without removing coverage entirely.

_{Reviews (7): Last reviewed commit: "fix: remove unnecessary file" | Re-trigger Greptile}

Greptile also left 1 inline comment on this PR.

greptile-apps · 2026-03-01T14:55:39Z

Greptile Summary

This PR adds Euclidean one-to-many distance computation with SIMD optimizations (AVX2, AVX512F, AVX512FP16) for float32, float16, and int8 data types. However, multiple critical bugs exist in the new implementation files that will cause incorrect distance calculations:

euclidean_distance_batch_impl.h: Wrong FMA operation in AVX512F masked tail (line 81) computes dot product instead of squared difference; AVX2 switch statement (lines 128-148) missing dimension offsets
euclidean_distance_batch_impl_fp16.h: AVX512F implementation (line 139) computes dot product instead of squared distance
euclidean_distance_batch_impl_int8.h: Incorrect pointer arithmetic (line 74) and completely broken switch statement (lines 93-136) with wrong macro usage and array indices

These bugs were already identified in previous review threads and remain unfixed. Additionally:

Removed incorrect Hamming distance cases from SquaredEuclideanMetric::batch_distance()
Added batch_distance() method to EuclideanMetric
Minor formatting improvements across several files

Confidence Score: 0/5

This PR is NOT safe to merge - contains multiple critical bugs that will produce incorrect results
Score of 0 reflects multiple critical algorithmic errors in the core distance computation logic that will cause incorrect calculations across all three implementation files (fp32, fp16, int8). These bugs were previously identified but remain unfixed, and will result in wrong distance values being returned to callers.
All three implementation files require immediate attention: euclidean_distance_batch_impl.h, euclidean_distance_batch_impl_fp16.h, and euclidean_distance_batch_impl_int8.h

Important Files Changed

Filename	Overview
src/ailego/math_batch/euclidean_distance_batch_impl.h	New file with critical bugs in AVX512F masked tail handling (wrong FMA operation) and AVX2 switch statement (missing dim offsets)
src/ailego/math_batch/euclidean_distance_batch_impl_fp16.h	New file with critical bug in AVX512F implementation (dot product instead of squared distance at line 139)
src/ailego/math_batch/euclidean_distance_batch_impl_int8.h	New file with critical bugs in pointer arithmetic (line 74) and switch statement (lines 91-136: wrong macro usage with pointers and incorrect array indices)
src/ailego/math_batch/euclidean_distance_batch.h	New file defining batch distance computation structure, looks correct but depends on buggy implementation files
src/ailego/math_batch/distance_batch.h	Added Euclidean and SquaredEuclidean distance batch handlers using constexpr type checks
src/core/metric/euclidean_metric.cc	Added batch_distance method to EuclideanMetric and removed incorrect Hamming distance cases from SquaredEuclideanMetric

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[BaseDistance::ComputeBatch] --> B{Distance Type?}
    B -->|Euclidean| C[EuclideanDistanceBatch::ComputeBatch]
    B -->|SquaredEuclidean| D[SquaredEuclideanDistanceBatch::ComputeBatch]
    B -->|Other| E[Default _ComputeBatch]
    
    D --> F{Data Type?}
    F -->|float| G[SquaredEuclideanDistanceBatchImpl float]
    F -->|int8_t| H[SquaredEuclideanDistanceBatchImpl int8_t]
    F -->|Float16| I[SquaredEuclideanDistanceBatchImpl Float16]
    F -->|Other| J[Fallback Implementation]
    
    G --> K{CPU Features?}
    K -->|AVX512F| L[compute_one_to_many_squared_euclidean_avx512f_fp32]
    K -->|AVX2| M[compute_one_to_many_squared_euclidean_avx2_fp32]
    K -->|None| J
    
    H --> N{CPU Features?}
    N -->|AVX2| O[compute_one_to_many_squared_euclidean_avx2_int8]
    N -->|None| J
    
    I --> P{CPU Features?}
    P -->|AVX512FP16| Q[compute_one_to_many_squared_euclidean_avx512fp16_fp16]
    P -->|AVX512F| R[compute_one_to_many_squared_euclidean_avx512f_fp16]
    P -->|AVX2| S[compute_one_to_many_squared_euclidean_avx2_fp16]
    P -->|None| J
    
    C --> T[Call SquaredEuclideanDistanceBatch]
    T --> U[Apply sqrt to results]
    
    style L fill:#f99,stroke:#333,stroke-width:2px
    style M fill:#f99,stroke:#333,stroke-width:2px
    style O fill:#f99,stroke:#333,stroke-width:2px
    style Q fill:#f99,stroke:#333,stroke-width:2px
    style R fill:#f99,stroke:#333,stroke-width:2px
    style S fill:#f99,stroke:#333,stroke-width:2px

_{Last reviewed commit: b766fb1}

greptile-apps

_{13 files reviewed, 5 comments}

_{Edit Code Review Agent Settings | Greptile}

src/ailego/math_batch/euclidean_distance_batch_impl.h

src/ailego/math_batch/euclidean_distance_batch_impl_fp32_avx2.cc

src/ailego/math_batch/euclidean_distance_batch_impl_fp16.h

src/ailego/math_batch/euclidean_distance_batch_impl_int8.h

richyreachy · 2026-03-04T02:53:13Z

@greptile

richyreachy · 2026-03-20T14:44:54Z

@greptile

src/core/metric/hamming_metric.cc

richyreachy · 2026-03-23T04:36:38Z

@greptile

src/ailego/math_batch/euclidean_distance_batch_impl_fp16_avx2.cc

src/core/metric/euclidean_metric.cc

richyreachy · 2026-03-23T06:05:12Z

@greptile

src/ailego/math_batch/euclidean_distance_batch_impl_fp16_avx512.cc

src/ailego/math_batch/euclidean_distance_batch_impl_fp32_avx512.cc

src/ailego/math_batch/inner_product_distance_batch_impl_fp16.h

richyreachy · 2026-03-23T07:13:07Z

@greptile

tests/core/algorithm/hnsw/hnsw_streamer_test.cc

src/ailego/math_batch/euclidean_distance_batch_impl_fp32_avx512.cc

richyreachy · 2026-03-23T07:31:34Z

@greptile

richyreachy · 2026-03-23T07:48:33Z

@greptile

src/ailego/math_batch/euclidean_distance_batch_impl_fp16_avx512fp16.cc

…2fp16.cc Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

richyreachy · 2026-03-23T08:16:41Z

@greptile

richyreachy · 2026-03-23T08:19:47Z

@greptile

greptile-apps · 2026-03-23T08:25:23Z

src/ailego/math_batch/euclidean_distance_batch_impl_fp16_avx512.cc

+        _mm512_castps512_ps256(accs[i]),
+        _mm256_castpd_ps(_mm512_extractf64x4_pd(_mm512_castps_pd(accs[i]), 1)));
+  }
+


Off-by-one in 8-element partial-block boundary condition

The condition dim + 8 < dimensionality uses strict less-than, which means the 8-element AVX2 SIMD block is skipped when exactly 8 FP16 elements remain after the 16-element partial-block. In that case all 8 elements fall through to the scalar loop, silently degrading throughput for any dimension that satisfies dimensionality % 16 == 8 after the dim + 16 <= dimensionality pass (e.g., dim=24, 40, 56 …).

This is the same off-by-one pattern already present in inner_product_distance_batch_impl_fp16.h (AVX512F and AVX2 partial blocks). The fix is to use <=:

Suggested change

if (dim + 8 <= dimensionality) {

src/ailego/math_batch/inner_product_distance_batch_impl_fp16.h

richyreachy · 2026-03-23T08:37:43Z

@greptile

greptile-apps · 2026-03-23T09:12:23Z

src/ailego/math_batch/euclidean_distance_batch.h

+  }
+}
+


Prefetch of wrong address in fallback path

ailego_prefetch(&prefetch_ptrs[j]) prefetches the address of the j-th element within the stack-allocated prefetch_ptrs array itself — not the vector data the pointer refers to. This makes the prefetch completely ineffective. All the SIMD implementations use ailego_prefetch(prefetch_ptrs[i] + dim) to prefetch the actual data.

Additionally, there is no nullptr guard here, while every SIMD path checks if (prefetch_ptrs[0]) before issuing prefetches. Passing a null pointer to ailego_prefetch is typically a no-op, but the intent is clearly wrong.

Suggested change

}

}

ailego_prefetch(prefetch_ptrs[j]);

richyreachy added 4 commits February 11, 2026 16:11

add euclidean one2many implementation

09236a6

add euclidean one2many implementation

6370754

add euclidean one2many implementation

200cc06

format codes

9c07ff9

richyreachy requested a review from iaojnh March 1, 2026 14:47

greptile-apps bot reviewed Mar 1, 2026

View reviewed changes

richyreachy added 3 commits March 2, 2026 16:22

format codes

9545fba

fix macro missing

601de2a

fix testcase

b766fb1

feihongxu0824 assigned iaojnh Mar 11, 2026

richyreachy added 7 commits March 20, 2026 15:36

merge: merge from main

4cb66f1

refactor: euclidean

e20e4e6

fix: update code

30470e8

fix: clang format

05e3dcf

fix: clang format

4cfc424

fix: fix format

e5590fe

fix: fix index

93ee2a0

fix: fix format

91e9813

greptile-apps bot reviewed Mar 20, 2026

View reviewed changes

src/core/metric/hamming_metric.cc Show resolved Hide resolved

richyreachy added 3 commits March 20, 2026 23:42

fix: fix format

407ee5c

fix: fix format

55eada0

Merge branch 'main' into feat/euclidean_one2many

19b094b