Skip to content

perf(index): improve FTS search metadata caching#7398

Open
Xuanwo wants to merge 1 commit into
mainfrom
xuanwo/perf-fts-search-qps
Open

perf(index): improve FTS search metadata caching#7398
Xuanwo wants to merge 1 commit into
mainfrom
xuanwo/perf-fts-search-qps

Conversation

@Xuanwo

@Xuanwo Xuanwo commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

Summary

This PR improves FTS search throughput by avoiding repeated metadata reads on hot search paths:

  • caches immutable corpus-level BM25 stats on the loaded InvertedIndex
  • caches per-token posting metadata (max_score, posting length) in the existing partition-prefixed Lance cache
  • keeps token sets resident behavior unchanged and does not cache posting list bodies

The main target is global QPS under concurrent full-text search, especially when the index is stored on object storage.

S3 Performance

Benchmark shape for both datasets:

  • query set: the, data, learning, world, machine learning, artificial intelligence, 中国, 人工智能
  • limit=10, projected columns: _rowid, _score
  • warmup: query set x3
  • each concurrency point runs for 20s
  • baseline and patched results returned identical row ids and 6-decimal scores for the query set

1M S3 Dataset

Dataset: s3://xuanwo-fts-bench-use1/datasets/mmlb_1m_all_columns_no_image_en_zh_icu_bench_icu-1m-perf-opt-20260619T143109Z.lance

concurrency baseline QPS patched QPS QPS delta baseline p95 patched p95
1 7.73 13.43 +73.7% 202.45ms 108.64ms
2 15.51 24.49 +57.9% 210.17ms 122.68ms
4 34.45 53.01 +53.9% 184.70ms 125.08ms
8 71.74 96.25 +34.2% 171.57ms 129.44ms
16 120.33 199.30 +65.6% 226.07ms 125.03ms
32 214.90 242.96 +13.1% 279.15ms 283.01ms

The 32-concurrency point is saturated/noisy; the improvement is stable at 1-16 concurrency.

10M S3 Dataset

Dataset: s3://xuanwo-fts-bench-use1/datasets/mmlb_10m_full_content_icu_s3_search_20260623T000000Z.lance

  • 10,000,000 rows, 10 fragments
  • 19 S3 objects, 69,994,744,162 bytes total
  • FTS index size: 7.76 GiB
concurrency baseline QPS patched QPS QPS delta baseline p95 patched p95
1 7.40 11.55 +56.1% 290.69ms 152.26ms
2 14.45 22.90 +58.5% 236.91ms 132.04ms
4 35.20 44.90 +27.6% 175.41ms 145.16ms
8 64.10 86.55 +35.0% 198.30ms 142.27ms
16 132.55 160.40 +21.0% 185.01ms 163.47ms
32 211.95 235.70 +11.2% 283.94ms 305.04ms

The 10M S3 result confirms the object-store improvement at larger index scale. The 32-concurrency point remains saturated/noisy and has a p95 regression despite higher QPS.

Validation

  • cargo fmt --all
  • git diff --check
  • cargo test -p lance-index scalar::inverted::index::tests::
  • cargo clippy --all --tests --benches -- -D warnings

@github-actions github-actions Bot added A-index Vector index, linalg, tokenizer performance labels Jun 22, 2026
@Xuanwo Xuanwo marked this pull request as ready for review June 22, 2026 17:53
@codecov

codecov Bot commented Jun 22, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 93.15068% with 5 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance-index/src/scalar/inverted/index.rs 93.15% 1 Missing and 4 partials ⚠️

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-index Vector index, linalg, tokenizer performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant