Skip to content

perf: reduce allocations in hot query execution paths#154

Merged
kuseman merged 1 commit into
masterfrom
performance_fixes
Jun 3, 2026
Merged

perf: reduce allocations in hot query execution paths#154
kuseman merged 1 commit into
masterfrom
performance_fixes

Conversation

@kuseman

@kuseman kuseman commented Jun 2, 2026

Copy link
Copy Markdown
Owner
  • TupleVector.validate: use List path stack instead of eager string concatenation on each recursive schema descent; path is only joined when an exception is thrown
  • TemporaryTable.IndexTupleVector: cache selected columns to avoid recreating SelectedValueVector (+ int[] copy) on every getColumn call
  • ExecutionContext.copy: share stateless ExpressionFactory instance instead of allocating a new one per NestedLoop outer-row iteration

@kuseman kuseman force-pushed the performance_fixes branch 4 times, most recently from d92cd8b to c7f8a44 Compare June 3, 2026 10:55
  Performance fixes (allocation hot spots):
  - TupleVector.validate: replace eager string concatenation on each
    recursive schema descent with a List<String> path stack; string is
    only joined when an exception is thrown (eliminated 248 GB/s of
    byte[] allocation in production JFR)
  - HashMatch, NestedLoop: return cached schema field from getSchema()
    instead of calling joinSchema() on every probe/iteration; both
    operators stored the schema in the constructor but the public
    override recomputed it each time
  - TemporaryTable.IndexTupleVector: cache selected columns to avoid
    recreating SelectedValueVector (+ int[] copy) on every getColumn call
  - ExecutionContext.copy: share the stateless ExpressionFactory instance
    instead of allocating a new one per NestedLoop outer-row iteration

BREAKING CHANGE:
  Memory leak fix:
  - IDatasink.execute signature changed from TupleIterator to
    Supplier<TupleIterator> so sinks can re-execute the upstream plan
    on demand (cache hit skips execution; cache refresh calls input.get()
    for a fresh iterator each time, matching old versions behaviour)
  - InsertInto: removed LazyTupleIterator; passes () -> input.execute(context)
    as the supplier, forwards estimatedBatchCount/estimatedRowCount,
    guards against sinks that forget to close the iterator
  - SelectIntoTempTableSink: materialise result through TupleVectorBuilder
    instead of relying on PlanUtils.concat's single-batch fast-path which
    returned the raw TableScan$1$1 anonymous TupleVector; that vector held
    a strong reference via this$1 -> TableScan$1 -> val$context ->
    ExecutionContext -> QuerySession -> temporaryTables, retaining the
    entire execution context chain for the lifetime of the cached entry
  - AInMemoryCache: document that expired entries reload asynchronously
    for alwaysLoadAsync=false; the Supplier<TupleIterator> API change
    ensures async reload always has a re-executable supplier
@kuseman kuseman force-pushed the performance_fixes branch from c7f8a44 to a92db8c Compare June 3, 2026 11:23
@kuseman kuseman merged commit e30362b into master Jun 3, 2026
1 check passed
@kuseman kuseman deleted the performance_fixes branch June 3, 2026 11:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant