Skip to content

Tighten subset ordering contract coverage#179

Open
Fieldnote-Echo wants to merge 3 commits into
codex/manifest-cache-invariant-matrixfrom
codex/determinism-fuzz-contract
Open

Tighten subset ordering contract coverage#179
Fieldnote-Echo wants to merge 3 commits into
codex/manifest-cache-invariant-matrixfrom
codex/determinism-fuzz-contract

Conversation

@Fieldnote-Echo
Copy link
Copy Markdown
Owner

@Fieldnote-Echo Fieldnote-Echo commented Jun 4, 2026

Summary

  • make RankQuant and two-stage fuzz targets assert the full score-descending/doc-ID-ascending ordering contract
  • move the RankQuant SIMD centre offset into TopK insertion/eviction so boundary retention uses the same visible score key as final output
  • add the k=1 offset/rounding-collapse eviction regression
  • strengthen two-stage regression coverage so full-candidate subset rerank preserves full RankQuant IDs and ordered scores
  • document duplicate-candidate behavior consistently across Rust, Python, C ABI/header, Go, and rank-mode docs
  • add a Python RankQuant duplicate-candidate regression and remove stale sentinel-padding wording from the Python subset docstring

Review notes

Review remediation found the earlier finalize-only offset approach was insufficient for TopK boundary eviction. The branch now applies the query-constant offset before TopK retention decisions and keeps final sorting on the already-visible score key.

Subset/two-stage duplicate-candidate assertions intentionally allow equal row IDs where duplicate candidates are documented; full-index fuzz assertions require strict doc-id increase on score ties.

Validation

  • cargo fmt --check
  • cbindgen ordvec-ffi --config ordvec-ffi/cbindgen.toml --output ordvec-ffi/include/ordvec.h --verify
  • cargo test -p ordvec topk_score_offset_is_part_of_eviction_key
  • cargo test -p ordvec --test determinism_contract
  • cargo test -p ordvec --test index two_stage
  • cargo test -p ordvec
  • cargo test -p ordvec-ffi
  • cargo clippy --manifest-path fuzz/Cargo.toml --all-targets -- -D warnings
  • env GOCACHE=/tmp/ordvec-go-build go test -count=1 ./... from ordvec-go
  • VIRTUAL_ENV=/home/ndspence/GitHub/ordvec/ordvec-python/.venv PATH=/home/ndspence/GitHub/ordvec/ordvec-python/.venv/bin:$PATH pytest ordvec-python/tests/test_rank_quant.py -q
  • git diff --check

@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 4, 2026

Codecov Report

❌ Patch coverage is 92.30769% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/quant.rs 80.00% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request documents and tests the behavior of duplicate candidate IDs in subset searches, clarifying that duplicate candidates are scored independently and can produce duplicate hits. It updates documentation across C, Go, Python, and Rust interfaces, and adds assertions in tests and fuzz targets to verify score-descending and document-ID-ascending ordering. A review comment suggests strengthening the tie-breaker assertion in the full-index search fuzz target to require strictly ascending document IDs, as full searches should not return duplicate IDs.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread fuzz/fuzz_targets/search_rankquant.rs Outdated
Copy link
Copy Markdown
Owner Author

CI follow-up after the first run:

  • The signbitmap_rankquant_twostage fuzz smoke found a real visible-ordering issue: the SIMD RankQuant asymmetric path was adding the dropped centre offset after TopK finalization, so two distinct internal scores could round-collapse into equal visible scores after the offset and violate the public (score desc, row_id asc) order.
  • Fixed by applying the query-constant offset inside TopK finalization before the final visible-score sort.
  • Added regressions for the low-level offset/tie behavior and the fuzz-shaped two-stage case.

Local validation after the fix:

cargo test -p ordvec topk_offset_sort_uses_visible_score_ties
cargo test -p ordvec two_stage
cargo test -p ordvec --test determinism_contract
cargo clippy --manifest-path fuzz/Cargo.toml --all-targets -- -D warnings
cargo test -p ordvec
git diff --check

The AVX512 failure on the same run still appears to be the Intel SDE download 403, not a code regression.

@Fieldnote-Echo Fieldnote-Echo marked this pull request as ready for review June 4, 2026 14:41
@qodo-code-review
Copy link
Copy Markdown

Review Summary by Qodo

Fix visible RankQuant tie ordering after score offsets and strengthen contract coverage

🐞 Bug fix ✨ Enhancement

Grey Divider

Walkthroughs

Description
• Fix visible RankQuant tie ordering after score offsets applied
• Strengthen subset ordering contract coverage with full score-desc/doc-ID-asc assertions
• Document duplicate-candidate behavior consistently across all language bindings
• Add regression test for centre-offset visible-score tie ordering
Diagram
flowchart LR
  A["Score offset applied<br/>before finalize"] -->|"finalize_with_score_offset_into"| B["Visible-score ties<br/>ordered by row ID"]
  B -->|"full contract"| C["Score-desc/<br/>doc-ID-asc ordering"]
  D["Fuzz targets &<br/>regression tests"] -->|"assert full ordering"| C
  E["Documentation<br/>across bindings"] -->|"clarify duplicate<br/>candidate behavior"| F["Consistent contract<br/>specification"]

Loading

Grey Divider

File Changes

1. fuzz/fuzz_targets/search_rankquant.rs ✨ Enhancement +15/-11

Tighten RankQuant fuzz target ordering assertions

fuzz/fuzz_targets/search_rankquant.rs


2. fuzz/fuzz_targets/signbitmap_rankquant_twostage.rs ✨ Enhancement +16/-2

Add score-desc/doc-ID-asc ordering check to two-stage fuzz

fuzz/fuzz_targets/signbitmap_rankquant_twostage.rs


3. src/quant.rs 🐞 Bug fix +14/-24

Apply score offset before finalize sort for correct tie ordering

src/quant.rs


View more (10)
4. src/util.rs ✨ Enhancement +31/-1

Add finalize_with_score_offset_into for visible-score tie ordering

src/util.rs


5. tests/index/two_stage.rs 🧪 Tests +45/-8

Add regression test for centre-offset visible-score ties

tests/index/two_stage.rs


6. ordvec-python/src/lib.rs 📝 Documentation +8/-3

Document duplicate-candidate behavior in Python docstring

ordvec-python/src/lib.rs


7. ordvec-python/tests/test_rank_quant.py 🧪 Tests +17/-2

Add duplicate-candidate regression test and fix docstring

ordvec-python/tests/test_rank_quant.py


8. ordvec-ffi/src/lib.rs 📝 Documentation +7/-0

Document duplicate-candidate behavior in C ABI comments

ordvec-ffi/src/lib.rs


9. ordvec-ffi/include/ordvec.h 📝 Documentation +9/-0

Document duplicate-candidate behavior in C header

ordvec-ffi/include/ordvec.h


10. ordvec-go/doc.go 📝 Documentation +4/-0

Document duplicate-candidate behavior in Go package docs

ordvec-go/doc.go


11. ordvec-go/ordvec.go 📝 Documentation +4/-0

Document duplicate-candidate behavior in Go SearchOptions

ordvec-go/ordvec.go


12. ordvec-go/README.md 📝 Documentation +4/-0

Document duplicate-candidate behavior in Go README

ordvec-go/README.md


13. docs/RANK_MODES.md 📝 Documentation +3/-0

Document duplicate-candidate behavior in rank modes docs

docs/RANK_MODES.md


Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Copy Markdown

qodo-code-review Bot commented Jun 4, 2026

Code Review by Qodo

🐞 Bugs (2) 📘 Rule violations (0)

Grey Divider


Remediation recommended

1. Offset not used for TopK 🐞 Bug ≡ Correctness
Description
TopK::finalize_with_score_offset_into applies score_offset only during the final sort/output,
while TopK::maybe_insert/eviction decisions are still made on pre-offset scores. This means the
returned *set* of top-k results is not guaranteed to be the top-k under the documented
(visible_score desc, tie_key asc) order when the offset changes float rounding (collapsing
distinct scores into equal visible scores).
Code

src/util.rs[R480-495]

+    /// Drain into `out_scores` / `out_indices`, applying a query-constant score
+    /// offset before the final `(score desc, tie_key asc)` ordering.
+    ///
+    /// SIMD RankQuant asymmetric kernels drop a query-constant centre term from
+    /// the hot loop and re-apply it at finalize time. Adding that offset can
+    /// collapse two distinct finite `f32` scores into one visible output score,
+    /// so the public tie order must be computed after the offset is applied.
+    pub(crate) fn finalize_with_score_offset_into(
+        &self,
+        out_scores: &mut [f32],
+        out_indices: &mut [i64],
+        score_offset: f32,
+    ) {
        debug_assert_eq!(out_scores.len(), out_indices.len());
        for s in out_scores.iter_mut() {
            *s = f32::NEG_INFINITY;
Evidence
The code sorts/output on offset-applied scores, but selection/eviction is still based on raw scores;
this breaks the guarantee that returned results are the top-k under the same ordering key used for
output.

src/util.rs[435-447]
src/util.rs[480-531]
src/quant.rs[400-410]
src/quant.rs[651-660]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`TopK::finalize_with_score_offset_into` changes the ordering key at finalize time by sorting on `(s + score_offset, tie_key, slot)`.

However, `TopK::maybe_insert` and `recompute_worst` still decide what to keep/evict using the *pre-offset* score. When `score_offset` causes rounding-collapse (two distinct `s` values map to the same visible `s + score_offset`), the set of retained entries can differ from the true top-k under the published visible-score order.

### Issue Context
This path is used by RankQuant’s asymmetric SIMD “centre-drop” kernels, where the score shift is applied at the end of the scan.

### Fix Focus Areas
- src/util.rs[421-470]
- src/util.rs[472-532]
- src/quant.rs[341-410]
- src/quant.rs[584-673]

### Implementation direction
- Ensure the *same* score key used for final ordering is also used for keep/evict decisions.
 - Option A (simplest): when `centre_drop_used`, add the query-constant `offset` to each candidate score **before** calling `TopK::maybe_insert` (i.e., store visible scores in `TopK` so selection and ordering match).
 - Option B: extend `TopK` to carry an optional `score_offset` and apply it consistently in comparisons (`better` check and `recompute_worst`) as well as finalize.
- Add a regression test demonstrating boundary behavior (e.g., `k=1`, two scores that become equal after offset, and different tie keys) so selection respects the visible-score tie policy, not just ordering of already-kept items.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Advisory comments

2. Order asserts ignore total_cmp 🐞 Bug ⚙ Maintainability
Description
New test/fuzz helpers assert score order using < and ==, but production result ordering is
defined using f32::total_cmp (a different total order for some finite edge cases like -0.0 vs
0.0). This can let contract tests drift from actual ordering semantics or become brittle on edge
float values.
Code

tests/index/two_stage.rs[R22-31]

+fn assert_score_then_id_order(scores: &[f32], ids: &[i64]) {
+    for slot in 1..scores.len() {
+        let prev = (scores[slot - 1], ids[slot - 1]);
+        let cur = (scores[slot], ids[slot]);
+        assert!(
+            cur.0 < prev.0 || (cur.0 == prev.0 && cur.1 >= prev.1),
+            "results violate score-desc/doc-id-asc order at slots {} and {slot}",
+            slot - 1,
+        );
+    }
Evidence
The assertions use numeric </==, while the production sort uses total_cmp; aligning them
avoids contract drift on float edge cases.

tests/index/two_stage.rs[22-31]
fuzz/fuzz_targets/search_rankquant.rs[112-121]
fuzz/fuzz_targets/signbitmap_rankquant_twostage.rs[36-45]
src/util.rs[514-524]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
Several new “score-desc/doc-id-asc” assertion helpers use `cur_score < prev_score || (cur_score == prev_score && ...)`.

The implementation’s ordering uses `f32::total_cmp` for the score key, which is not identical to `<`/`==` for all finite float values (notably `-0.0` vs `0.0`). To keep contract checks aligned with the library’s actual ordering definition, the asserts should mirror the same comparator.

### Issue Context
`TopK::finalize_with_score_offset_into` sorts by `b.0.total_cmp(&a.0)` and then tie key / slot.

### Fix Focus Areas
- tests/index/two_stage.rs[22-32]
- fuzz/fuzz_targets/search_rankquant.rs[112-121]
- fuzz/fuzz_targets/signbitmap_rankquant_twostage.rs[36-46]
- src/util.rs[514-524]

### Implementation direction
- Replace `<`/`==` logic with `total_cmp`-based logic, e.g.:
 - Ensure `prev_score.total_cmp(&cur_score) != Ordering::Less` (descending)
 - If `prev_score.total_cmp(&cur_score) == Ordering::Equal`, assert `prev_id <= cur_id` (doc-id ascending)
- This keeps tests/fuzz targets checking the exact same ordering contract as the production sort.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

Copy link
Copy Markdown
Owner Author

Remediated the TopK offset boundary issue in e2ffc7f.

What changed:

  • Moved the RankQuant SIMD centre offset into TopK insertion/eviction via set_score_offset, so the retained top-k set is selected using the same visible score key that final output exposes.
  • Removed the finalize-only offset path.
  • Added the k=1 boundary regression where f32 rounding collapses visible scores and the lower global row key must win before eviction.
  • Tightened the full-index fuzz ordering assertion to require strict doc-id increase on score ties; subset/two-stage duplicate-candidate assertions remain permissive where duplicates are part of the documented contract.

Validation:

  • cargo test -p ordvec topk_score_offset_is_part_of_eviction_key
  • cargo test -p ordvec --test index two_stage
  • cargo test -p ordvec --test determinism_contract
  • cargo test -p ordvec
  • cargo clippy --manifest-path fuzz/Cargo.toml --all-targets -- -D warnings
  • git diff --check

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3ec4897bae

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/util.rs Outdated
@Fieldnote-Echo Fieldnote-Echo force-pushed the codex/manifest-cache-invariant-matrix branch from 3064191 to cf7ae3e Compare June 4, 2026 15:51
@Fieldnote-Echo Fieldnote-Echo force-pushed the codex/determinism-fuzz-contract branch from e2ffc7f to e34e012 Compare June 4, 2026 15:51
@Fieldnote-Echo Fieldnote-Echo force-pushed the codex/manifest-cache-invariant-matrix branch from cf7ae3e to f92d771 Compare June 4, 2026 16:44
Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
@Fieldnote-Echo Fieldnote-Echo force-pushed the codex/determinism-fuzz-contract branch from e34e012 to b0e91a9 Compare June 4, 2026 16:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant