Skip to content

perf: reduce per-operation allocations on hot paths (rank_transform_into, l2_normalise) #38

@Fieldnote-Echo

Description

@Fieldnote-Echo

Deferred from the r5 pre-release audit (findings #3 and #4) as correctness-neutral, low-value / higher-effort perf work.

  • rank_transform_into (rank.rs) allocates a Vec<u16> argsort/order buffer per document. In the parallel Rank::add path each Rayon thread allocates and frees its own buffer per doc. Short-lived LIFO allocations amortise well, but a thread-local / reused scratch buffer would remove the per-doc alloc under high-throughput ingestion.
  • l2_normalise (util.rs) allocates a Vec<f32> per query on the asymmetric search path — a frequent small alloc under high QPS.

Neither is a correctness issue. Candidate approaches: thread-local scratch (thread_local!), a caller-supplied scratch buffer, or a normalise-in-place variant. Measure against the bench before/after.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions