Skip to content

Reduce per-call allocations in GraphSearcher.searchOneLayer and FusedPQDecoder#15

Merged
eolivelli merged 1 commit into
mainfrom
reduce-allocations-search-and-fusedpq
May 22, 2026
Merged

Reduce per-call allocations in GraphSearcher.searchOneLayer and FusedPQDecoder#15
eolivelli merged 1 commit into
mainfrom
reduce-allocations-search-and-fusedpq

Conversation

@eolivelli
Copy link
Copy Markdown
Owner

Summary

Allocation profiling of OnDiskGraphIndexCompactor workloads (async-profiler alloc mode, 30 s) showed two hotspots accounting for ~87% of all allocations on the compaction ForkJoin workers. This PR removes both.

Site Before What it allocated
GraphSearcher.searchOneLayer (lines 483 + 488) 61.9% of allocs NeighborProcessor lambda + visited::add method reference, allocated per loop iteration via DirectMethodHandle.allocateInstance
FusedPQDecoder.newDecoder<init>VectorUtil.sub 24.8% of allocs new ArrayVectorFloat(new float[dim]) for the centered query on every decoder construction (one per search)

GraphSearcher.searchOneLayer

  • Hoist scoreTracker to an instance field so the NeighborProcessor lambda needs to capture only this.
  • Make the NeighborProcessor a permanent instance field, allocated once in the constructor.
  • Make the IntMarker for visited a final field bound once at construction (this.visitedAdder = visited::add).
  • Call site now uses the two fields instead of building fresh callbacks per iteration.

FusedPQDecoder / FusedPQ

  • Add a ThreadLocal<VectorFloat<?>> reusableCenteredQuery in FusedPQ, mirroring the existing reusableResults / reusableNeighborCodes / pqCodeScratch pattern.
  • Plumb the buffer through FusedPQ.approximateScoreFunctionForFusedPQDecoder.newDecoder and the three subclass constructors.
  • Replace VectorUtil.sub(query, center) (allocates) with a new VectorUtil.subInto(dest, lhs, rhs) helper that writes the difference into the thread-local scratch.

The centered-query buffer is read-only after construction (just feeds VectorUtil.calculatePartialSums), so a thread-local lifetime is safe — and FusedPQDecoder is already constructed concurrently from multiple ForkJoin workers.

Test plan

  • ./mvnw -pl jvector-base,jvector-twenty,jvector-native -am compile — all modules compile.
  • ./mvnw -pl jvector-tests -am test275 tests pass, 0 failures, 0 errors, 2 skipped. Includes TestOnDiskGraphIndexCompactor, TestFusedGraphIndex, TestProductQuantization, TestCompressedVectors, TestReconstructionError, TestOnDiskGraphIndex, Test2DThreshold, TestConcurrentReadWriteDeletes, etc.
  • Re-profile the HerdDB indexing workload with asprof -e alloc and confirm the two allocation sites drop to near zero.

🤖 Generated with Claude Code

…PQDecoder

Allocation profiling of OnDiskGraphIndexCompactor workloads showed two
hotspots accounting for ~87% of all allocations on the compaction
ForkJoin workers.

GraphSearcher.searchOneLayer (61.9% of allocs):
  The NeighborProcessor lambda and the visited::add method reference
  were being allocated on every iteration of the search loop via
  DirectMethodHandle.allocateInstance. Hoist both into reusable instance
  fields and promote scoreTracker to a field so the processor lambda
  captures only `this`, allowing it to be created once per searcher.

FusedPQDecoder.newDecoder (24.8% of allocs):
  The base/Cosine decoder constructors called VectorUtil.sub(query,
  center) which allocates a new ArrayVectorFloat for the centered
  query on every decoder construction (one per GraphSearcher.search
  call). Add a thread-local centered-query scratch buffer in FusedPQ
  (mirroring the existing reusableResults / reusableNeighborCodes
  pattern) and a VectorUtil.subInto(dest, lhs, rhs) helper that writes
  the difference into a caller-provided destination without allocating.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@eolivelli eolivelli merged commit 0cd4b66 into main May 22, 2026
4 of 10 checks passed
@eolivelli eolivelli deleted the reduce-allocations-search-and-fusedpq branch May 22, 2026 10:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant