perf(fts): prune low-scoring conjunction candidates by BubbleCal · Pull Request #7386 · lance-format/lance

BubbleCal · 2026-06-22T06:04:22Z

Performance Improvement

What is the performance issue or bottleneck?

For FTS conjunction searches, once the top-k threshold is established, the AND path can still fully validate and score aligned candidate documents even when a cheap upper bound proves they cannot enter the heap. That pays for full BM25 scoring, phrase checks, and frequency collection for candidates that are already below the competitive threshold.

How does this PR improve performance?

This adds an AND-only score-first candidate prune in Wand::search. After all conjunction postings are aligned, the scorer first computes the exact contribution of one lead posting, then adds the remaining postings' current block-max scores as a safe upper bound. If that upper bound cannot beat the threshold, the candidate is skipped before phrase validation, full scoring, and term-frequency collection.

The change is intentionally narrow:

OR and flat-search paths are unchanged.
Missing-term and fuzzy AND semantics are unchanged.
The bound uses existing block-max scores, so exact top-k behavior is preserved for wand_factor == 1.0.
Phrase queries still use the prune only when the BM25 upper bound is already non-competitive.

Benchmark or measurement results

No end-to-end benchmark was run for this draft. The new regression coverage includes a counting scorer case that verifies low-scoring AND candidates avoid full scoring, plus a top-k correctness case that keeps a later high-scoring candidate.

Validation

cargo fmt --all --check
git diff --check
CARGO_TARGET_DIR=/tmp/lance-target-fts-and-prune-main cargo test -p lance-index scalar::inverted::wand::tests -- --nocapture
CARGO_TARGET_DIR=/tmp/lance-target-fts-and-prune-clippy cargo clippy --all --tests --benches -- -D warnings

codecov · 2026-06-22T06:44:23Z

Codecov Report

❌ Patch coverage is 97.63033% with 5 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
rust/lance-index/src/scalar/inverted/wand.rs	97.36%	4 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

…e-prune

Xuanwo

I did not find a blocking correctness, security, or concurrency issue in the AND prune itself.

The remaining test gap is that the new prune branch is only directly covered for the compressed posting path. Plain postings, phrase AND, fuzzy AND, mask/tombstone filtering, and exec/analyze-plan metric exposure are still not directly covered, so the unchanged-semantics claim is mostly carried by existing tests rather than targeted coverage for this branch.

Xuanwo · 2026-06-22T16:20:49Z

+            .saturating_sub(pruned_before_return_start);
+        metrics.record_and_candidates_seen(and_candidates_seen);
+        metrics.record_and_candidates_pruned_before_return(and_candidates_pruned_before_return);
+        metrics.record_and_candidates_pruned_before_score(and_candidates_pruned_before_return);


These two counters are recorded from the same value, while and_candidates_seen only counts candidates that returned from next(). Once exposed as plan metrics, these names can lead users to compute prune rates with the wrong denominator.

perf(fts): prune low-scoring conjunction candidates

8d738ec

github-actions Bot added A-index Vector index, linalg, tokenizer performance labels Jun 22, 2026

fix(python): avoid numpy shape mutation warnings

a70dd2f

github-actions Bot added the A-python Python bindings label Jun 22, 2026

BubbleCal added 3 commits June 22, 2026 16:55

Merge remote-tracking branch 'origin/main' into yang/fts-and-candidat…

f1679b6

…e-prune

perf(fts): add and candidate prune metrics

067107f

perf(fts): prune and candidates before return

037dbd5

BubbleCal marked this pull request as ready for review June 22, 2026 13:57

chore: merge main into fts candidate prune

1b190b7

Xuanwo reviewed Jun 22, 2026

View reviewed changes

fix(index): remove ambiguous AND prune metric

6a4dfc6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(fts): prune low-scoring conjunction candidates#7386

perf(fts): prune low-scoring conjunction candidates#7386
BubbleCal wants to merge 7 commits into
mainfrom
yang/fts-and-candidate-prune

BubbleCal commented Jun 22, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 22, 2026 •

edited

Loading

Uh oh!

Xuanwo left a comment

Uh oh!

Xuanwo Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

BubbleCal commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance Improvement

What is the performance issue or bottleneck?

How does this PR improve performance?

Benchmark or measurement results

Validation

Uh oh!

codecov Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Xuanwo left a comment

Choose a reason for hiding this comment

Uh oh!

Xuanwo Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

BubbleCal commented Jun 22, 2026 •

edited

Loading

codecov Bot commented Jun 22, 2026 •

edited

Loading