Deferred from the r5 pre-release audit (findings #3 and #4) as correctness-neutral, low-value / higher-effort perf work.
rank_transform_into (rank.rs) allocates a Vec<u16> argsort/order buffer per document. In the parallel Rank::add path each Rayon thread allocates and frees its own buffer per doc. Short-lived LIFO allocations amortise well, but a thread-local / reused scratch buffer would remove the per-doc alloc under high-throughput ingestion.
l2_normalise (util.rs) allocates a Vec<f32> per query on the asymmetric search path — a frequent small alloc under high QPS.
Neither is a correctness issue. Candidate approaches: thread-local scratch (thread_local!), a caller-supplied scratch buffer, or a normalise-in-place variant. Measure against the bench before/after.
Deferred from the r5 pre-release audit (findings #3 and #4) as correctness-neutral, low-value / higher-effort perf work.
rank_transform_into(rank.rs) allocates aVec<u16>argsort/order buffer per document. In the parallelRank::addpath each Rayon thread allocates and frees its own buffer per doc. Short-lived LIFO allocations amortise well, but a thread-local / reused scratch buffer would remove the per-doc alloc under high-throughput ingestion.l2_normalise(util.rs) allocates aVec<f32>per query on the asymmetric search path — a frequent small alloc under high QPS.Neither is a correctness issue. Candidate approaches: thread-local scratch (
thread_local!), a caller-supplied scratch buffer, or a normalise-in-place variant. Measure against the bench before/after.