Conversation
Greptile SummaryThis PR adds Euclidean one-to-many distance computation with SIMD optimizations (AVX2, AVX512F, AVX512FP16) for float32, float16, and int8 data types. However, multiple critical bugs exist in the new implementation files that will cause incorrect distance calculations:
These bugs were already identified in previous review threads and remain unfixed. Additionally:
Confidence Score: 0/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[BaseDistance::ComputeBatch] --> B{Distance Type?}
B -->|Euclidean| C[EuclideanDistanceBatch::ComputeBatch]
B -->|SquaredEuclidean| D[SquaredEuclideanDistanceBatch::ComputeBatch]
B -->|Other| E[Default _ComputeBatch]
D --> F{Data Type?}
F -->|float| G[SquaredEuclideanDistanceBatchImpl float]
F -->|int8_t| H[SquaredEuclideanDistanceBatchImpl int8_t]
F -->|Float16| I[SquaredEuclideanDistanceBatchImpl Float16]
F -->|Other| J[Fallback Implementation]
G --> K{CPU Features?}
K -->|AVX512F| L[compute_one_to_many_squared_euclidean_avx512f_fp32]
K -->|AVX2| M[compute_one_to_many_squared_euclidean_avx2_fp32]
K -->|None| J
H --> N{CPU Features?}
N -->|AVX2| O[compute_one_to_many_squared_euclidean_avx2_int8]
N -->|None| J
I --> P{CPU Features?}
P -->|AVX512FP16| Q[compute_one_to_many_squared_euclidean_avx512fp16_fp16]
P -->|AVX512F| R[compute_one_to_many_squared_euclidean_avx512f_fp16]
P -->|AVX2| S[compute_one_to_many_squared_euclidean_avx2_fp16]
P -->|None| J
C --> T[Call SquaredEuclideanDistanceBatch]
T --> U[Apply sqrt to results]
style L fill:#f99,stroke:#333,stroke-width:2px
style M fill:#f99,stroke:#333,stroke-width:2px
style O fill:#f99,stroke:#333,stroke-width:2px
style Q fill:#f99,stroke:#333,stroke-width:2px
style R fill:#f99,stroke:#333,stroke-width:2px
style S fill:#f99,stroke:#333,stroke-width:2px
Last reviewed commit: b766fb1 |
src/ailego/math_batch/euclidean_distance_batch_impl_fp32_avx2.cc
Outdated
Show resolved
Hide resolved
|
@greptile |
|
@greptile |
|
@greptile |
|
@greptile |
src/ailego/math_batch/euclidean_distance_batch_impl_fp16_avx512.cc
Outdated
Show resolved
Hide resolved
src/ailego/math_batch/euclidean_distance_batch_impl_fp32_avx512.cc
Outdated
Show resolved
Hide resolved
|
@greptile |
src/ailego/math_batch/euclidean_distance_batch_impl_fp32_avx512.cc
Outdated
Show resolved
Hide resolved
src/ailego/math_batch/euclidean_distance_batch_impl_fp32_avx512.cc
Outdated
Show resolved
Hide resolved
|
@greptile |
1 similar comment
|
@greptile |
…2fp16.cc Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
|
@greptile |
|
@greptile |
| _mm512_castps512_ps256(accs[i]), | ||
| _mm256_castpd_ps(_mm512_extractf64x4_pd(_mm512_castps_pd(accs[i]), 1))); | ||
| } | ||
|
|
There was a problem hiding this comment.
Off-by-one in 8-element partial-block boundary condition
The condition dim + 8 < dimensionality uses strict less-than, which means the 8-element AVX2 SIMD block is skipped when exactly 8 FP16 elements remain after the 16-element partial-block. In that case all 8 elements fall through to the scalar loop, silently degrading throughput for any dimension that satisfies dimensionality % 16 == 8 after the dim + 16 <= dimensionality pass (e.g., dim=24, 40, 56 …).
This is the same off-by-one pattern already present in inner_product_distance_batch_impl_fp16.h (AVX512F and AVX2 partial blocks). The fix is to use <=:
| if (dim + 8 <= dimensionality) { |
|
@greptile |
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Prefetch of wrong address in fallback path
ailego_prefetch(&prefetch_ptrs[j]) prefetches the address of the j-th element within the stack-allocated prefetch_ptrs array itself — not the vector data the pointer refers to. This makes the prefetch completely ineffective. All the SIMD implementations use ailego_prefetch(prefetch_ptrs[i] + dim) to prefetch the actual data.
Additionally, there is no nullptr guard here, while every SIMD path checks if (prefetch_ptrs[0]) before issuing prefetches. Passing a null pointer to ailego_prefetch is typically a no-op, but the intent is clearly wrong.
| } | |
| } | |
| ailego_prefetch(prefetch_ptrs[j]); |
add euclidean one2many
Greptile Summary
This PR introduces Euclidean one-to-many batch distance kernels (
EuclideanDistanceBatchandSquaredEuclideanDistanceBatch) for FP32 and FP16 across AVX2, AVX512F, and AVX512FP16 ISA tiers, wires them intoEuclideanMetric::batch_distance()and upgradesSquaredEuclideanMetricfromBatchSize=1toBatchSize=12. Several correctness issues from prior review rounds have been addressed (masked-tail formula, swapped variable names, UB in shift expression), though a few regressions introduced alongside the new feature — removal ofHammingMetric::batch_distance(), removal ofDT_BINARY32/DT_BINARY64fromSquaredEuclideanMetric, and two disabled tests — remain unresolved.compute_one_to_many_squared_euclidean_fallbackcallsailego_prefetch(&prefetch_ptrs[j]), which prefetches the stack address of the pointer array element rather than the vector data. Every SIMD path usesailego_prefetch(prefetch_ptrs[i] + dim)for the actual data address; the fallback should follow the same pattern.distance_batch_math.h:sum4()andsum_top_bottom_avx()are defined and included in two translation units but are never called; the implementations useHorizontalAdd_FP32_V256/HorizontalAdd_FP32_V512frommatrix_utility.iinstead.batch_distance()removal, binary-type cases removed fromSquaredEuclideanMetric, tests silently disabled, off-by-one ineuclidean_distance_batch_impl_fp16_avx512.cc) were flagged in earlier review rounds and are still present.Confidence Score: 2/5
batch_distance()for Hamming, disabled tests masking the regression) remain unaddressed.src/ailego/math_batch/euclidean_distance_batch.h(wrong prefetch),src/core/metric/hamming_metric.ccandsrc/core/metric/euclidean_metric.cc(removed binary-type support),tests/core/algorithm/hnsw/hnsw_streamer_test.cc(disabled tests).Important Files Changed
sum4()andsum_top_bottom_avx()but neither function is actually called by any file that includes this header — dead code.SquaredEuclideanDistanceBatchandEuclideanDistanceBatchtemplates; contains a bug whereailego_prefetch(&prefetch_ptrs[j])prefetches the array's stack address instead of the pointed-to data.<(off-by-one, previously flagged).EuclideanMetric::batch_distance()with BatchSize=12 and bumpsSquaredEuclideanMetricfrom BatchSize=1 to 12; removesDT_BINARY32/DT_BINARY64cases fromSquaredEuclideanMetric(previously flagged regression).HammingMetric::batch_distance()override entirely; callers will now silently receivenullptrfrom the base class (previously flagged regression).TestBinaryConverterandTestBasicRefinersilently disabled with#if 0and no explanation; masks the regression caused by removing Hamming batch-distance support (previously flagged).Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A["BaseDistance::ComputeBatch()"] --> B{DistanceType?} B -- EuclideanDistanceMatrix --> C["EuclideanDistanceBatch::ComputeBatch()"] B -- SquaredEuclideanDistanceMatrix --> D["SquaredEuclideanDistanceBatch::ComputeBatch()"] B -- Other --> E["_ComputeBatch() (existing)"] C --> D C --> F["sqrt(results[i]) ∀ i"] D --> G{num_vecs loop\nbatches of BatchSize} G --> H["SquaredEuclideanDistanceBatchImpl\n::compute_one_to_many()"] G --> I["Tail: BatchSize=1 per vector"] H --> J{CPU dispatch} J -- AVX512FP16 --> K["avx512fp16_fp16 kernel"] J -- AVX512F --> L["avx512f_fp16 / fp32 kernel"] J -- AVX2 --> M["avx2_fp16 / fp32 kernel"] J -- Fallback --> N["compute_one_to_many\n_squared_euclidean_fallback()\n⚠️ wrong prefetch address"]Comments Outside Diff (1)
tests/core/algorithm/hnsw/hnsw_streamer_test.cc, line 3500 (link)#if 0TestBinaryConverterandTestBasicRefinerare both silenced with#if 0in this PR. Permanently disabling tests with preprocessor guards makes failures invisible and can hide regressions. If these tests are failing due to the binary-type removals inhamming_metric.cc/euclidean_metric.cc, the root cause should be fixed rather than the tests suppressed. If the tests are being temporarily skipped for another reason, usingDISABLED_as a Google Test prefix is the conventional way to do this without removing coverage entirely.Reviews (7): Last reviewed commit: "fix: remove unnecessary file" | Re-trigger Greptile