Skip to content

[codex] feat: add dense SignBitmap scoring#96

Merged
project-navi-bot merged 3 commits into
mainfrom
codex/sign-bitmap-score-all
May 28, 2026
Merged

[codex] feat: add dense SignBitmap scoring#96
project-navi-bot merged 3 commits into
mainfrom
codex/sign-bitmap-score-all

Conversation

@Fieldnote-Echo
Copy link
Copy Markdown
Owner

Summary

  • add SignBitmap::score_all(&[f32]) -> Vec<u32> returning dense sign-agreement counts aligned by doc id
  • add SignBitmap::score_all_batched(&[f32]) -> Vec<Vec<u32>> for query batches without top-k selection or sorting
  • expose Python SignBitmap.score_all(query) and SignBitmap.score_all_batched(queries) as dense uint32 NumPy arrays
  • cover Rust reference parity, batching parity, and Python shape/dtype/reference behavior

Closes #79.

Validation

  • cargo fmt --all --check
  • cargo test -p ordvec sign_bitmap::tests::score_all
  • cargo clippy --all-targets --all-features -- -D warnings
  • cargo test --all-features
  • maturin develop
  • ./.venv/bin/python -m pytest tests/test_sign_bitmap.py -q
  • ./.venv/bin/python -m pytest tests -q

Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 28, 2026

Codecov Report

❌ Patch coverage is 96.10390% with 6 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
ordvec-python/src/lib.rs 82.35% 6 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces dense full-corpus sign-agreement scoring methods (score_all and score_all_batched) to the SignBitmap struct, exposing them to Python via PyO3. The feedback focuses on a significant performance optimization: refactoring score_all_batched to return a flat Vec<u32> instead of a nested Vec<Vec<u32>>. This change avoids redundant heap allocations and allows the Python FFI layer to construct the 2D NumPy array efficiently. Corresponding updates to the Python bindings and unit tests are also provided to support this optimization.

Comment thread src/sign_bitmap.rs Outdated
Comment thread ordvec-python/src/lib.rs
Comment thread src/sign_bitmap.rs
Comment thread src/sign_bitmap.rs
@Fieldnote-Echo Fieldnote-Echo marked this pull request as ready for review May 28, 2026 15:52
@Fieldnote-Echo Fieldnote-Echo requested a review from Copilot May 28, 2026 15:52
@qodo-code-review
Copy link
Copy Markdown

Review Summary by Qodo

Add dense SignBitmap scoring methods for full-corpus evaluation

✨ Enhancement

Grey Divider

Walkthroughs

Description
• Add score_all() method for dense full-corpus sign-agreement scoring
• Add score_all_batched() for batched dense scoring without top-k selection
• Expose both methods to Python with proper NumPy array handling
• Add comprehensive tests for correctness and edge cases
Diagram
flowchart LR
  A["Query Vector"] -->|"score_all()"| B["Dense Scores"]
  C["Query Batch"] -->|"score_all_batched()"| D["Dense Score Matrix"]
  B -->|"NumPy uint32"| E["Python API"]
  D -->|"NumPy uint32"| E
  F["Hamming Distance"] -->|"Convert to Agreement"| B
  F -->|"Convert to Agreement"| D

Loading

Grey Divider

File Changes

1. src/sign_bitmap.rs ✨ Enhancement +120/-0

Core dense scoring implementation with batching

• Implement score_all(&[f32]) -> Vec computing sign-agreement counts for all indexed documents
• Implement score_all_batched(&[f32]) -> Vec> for batch queries without sorting
• Add three test cases: correctness validation, batching parity, and empty shape handling
• Use parallel iteration for Hamming distance to agreement conversion

src/sign_bitmap.rs


2. ordvec-python/src/lib.rs ✨ Enhancement +52/-0

Python bindings for dense scoring methods

• Add Python binding score_all() returning 1-D uint32 NumPy array aligned by document id
• Add Python binding score_all_batched() returning 2-D uint32 NumPy array with shape (batch,
 len(index))
• Validate C-contiguity of input arrays and handle overflow checks
• Use py.detach() to release GIL during heavy Rust scanning

ordvec-python/src/lib.rs


3. ordvec-python/tests/test_sign_bitmap.py 🧪 Tests +44/-0

Python tests for dense scoring functionality

• Add sign_agreement_reference() helper for validating scoring correctness
• Add test for score_all() shape, dtype, and reference value correctness
• Add test for score_all_batched() shape and single-query parity
• Add test for empty index and empty batch edge cases

ordvec-python/tests/test_sign_bitmap.py


View more (1)
4. ordvec-python/python/ordvec/__init__.py 📝 Documentation +7/-7

Update threading documentation for dense scoring

• Update threading documentation to mention dense scoring methods alongside candidate generators
• Clarify that dense scoring methods also release GIL during Rust scanning

ordvec-python/python/ordvec/init.py


Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Copy Markdown

qodo-code-review Bot commented May 28, 2026

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0)

Grey Divider


Remediation recommended

1. Batched scoring copies data ✓ Resolved 🐞 Bug ➹ Performance
Description
SignBitmap::score_all_batched allocates a flat scores buffer and then copies every row into a
fresh Vec<u32> via row.to_vec(), and the Python binding then copies those rows again to
re-flatten into a contiguous (batch, n) buffer. For large corpora/batches this multiplies peak
memory and can cause OOMs or severe slowdowns despite the final NumPy output being only batch*n*4
bytes.
Code

src/sign_bitmap.rs[R263-268]

Evidence
The Rust core explicitly copies each contiguous row out of the scores buffer into a new Vec<u32>
per query (row.to_vec()), and the Python binding then copies all those rows again into a
contiguous flat Vec<u32> to build a 2D NumPy array.

src/sign_bitmap.rs[257-268]
ordvec-python/src/lib.rs[1227-1251]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`score_all_batched` produces a contiguous `Vec<u32>` (`scores`), but then immediately duplicates all score data by converting it into `Vec<Vec<u32>>` with `scores.chunks(n).map(|row| row.to_vec()).collect()`. The Python binding then duplicates the data again by flattening `Vec<Vec<u32>>` back into a contiguous buffer for NumPy.

This amplifies memory use and adds extra full-size memcpy passes for the exact workloads this API targets (full-corpus scoring for many queries).

### Issue Context
- Core currently computes into `scores: Vec<u32>` and then clones each row.
- Python binding requires a contiguous `(batch, n)` array and flattens the jagged vector.

### Fix Focus Areas
- src/sign_bitmap.rs[234-268]
- ordvec-python/src/lib.rs[1222-1252]

### Suggested fix
1. In `ordvec_core::SignBitmap`, add a new API that returns a **single flat** row-major buffer, e.g.:
  - `pub fn score_all_batched_flat(&self, queries: &[f32]) -> (usize /*batch*/, usize /*n*/, Vec<u32>)`
  - or `pub fn score_all_batched_flat(&self, queries: &[f32]) -> Vec<u32>` plus `batch`/`n` derivable by the caller.
  This method should keep the existing internal `scores: Vec<u32>` and return it directly (no `row.to_vec()` loop).
2. Re-implement the existing `score_all_batched(&self) -> Vec<Vec<u32>>` as a thin wrapper over the flat method (keeping the ergonomic Rust API), doing the chunk-to-Vec<Vec> conversion only for callers that explicitly want it.
3. Update the Python binding `score_all_batched` to call the flat method and construct the NumPy `Array2` directly from the returned flat `Vec<u32>` via `Array2::from_shape_vec((batch, n), flat)` (eliminating the `for row in &result { flat.extend_from_slice(row) }` copy).
4. Keep the existing overflow guards, but update them as needed for the new return shape.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds full-corpus (“dense”) sign-agreement scoring APIs to SignBitmap in Rust and exposes them in the Python bindings to support QPP-style distribution statistics (issue #79).

Changes:

  • Add SignBitmap::score_all and SignBitmap::score_all_batched to return dense per-doc sign-agreement counts (no top-k, no sorting).
  • Expose SignBitmap.score_all(...) / score_all_batched(...) in Python as uint32 NumPy arrays and extend the Python docstring to mention dense scoring methods release the GIL.
  • Add Rust + Python tests to validate shape/dtype and reference parity for single and batched scoring, including empty-shape behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
src/sign_bitmap.rs Implements dense per-document scoring APIs and adds Rust unit tests for parity and empty cases.
ordvec-python/src/lib.rs Adds PyO3 bindings for dense scoring methods and returns results as NumPy uint32 arrays.
ordvec-python/tests/test_sign_bitmap.py Adds Python tests for shape/dtype and correctness against a NumPy reference implementation.
ordvec-python/python/ordvec/__init__.py Updates module docstring to include dense scoring methods in the GIL-release/threading contract.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/sign_bitmap.rs
Comment thread ordvec-python/src/lib.rs Outdated
Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
Copy link
Copy Markdown
Owner Author

Bot review disposition after follow-up fix:

  • Gemini/Copilot/Qodo copy-amplification finding: fixed in aafc591 with SignBitmap::score_all_batched_flat, preserving the nested Rust wrapper while giving Python and other performance-sensitive callers a flat row-major buffer.
  • Python binding now constructs the uint32 (batch, n) NumPy array directly from the flat Rust buffer.
  • Test update findings: covered by a new flat parity test plus expanded empty-shape coverage for flat and nested APIs.
  • Codecov patch report: added focused Rust coverage for the new flat API and empty-shape edge cases; existing Python dense-score tests continue to cover the binding output shape/dtype/parity.

Validation run locally:

  • cargo fmt --all --check
  • cargo test --all-features score_all
  • maturin develop
  • uv run pytest tests/test_sign_bitmap.py -k score_all

@project-navi-bot project-navi-bot merged commit 9a2c4be into main May 28, 2026
32 checks passed
@project-navi-bot project-navi-bot deleted the codex/sign-bitmap-score-all branch May 28, 2026 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SignBitmapIndex: expose full-corpus scores (score_all) for QPP statistics

3 participants