refactor: try reduce aggregate hash index cost on hot path #19072

dqhl76 · 2025-12-05T08:50:32Z

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

refactor: try reduce aggregate hash index cost on hot path

When perf tpch-1000 q18, I found find_or_insert on hot path, which cannot be seen on smaller scale dataset.

With pure linear probing (+1 each time), occupied slots tend to cluster. Once you hit such a cluster, the probing process need over a long run of consecutive occupied entries. If, instead, the next probe position is derived from the hash (i.e. more “random”), you break up these clusters. That may well hurt potential SIMD or prefetch optimisations, but it also shortens long probe chains on average.

This idea is insipred by a duckdb's optimization PR. Thanks!

Tests

Unit Test
Logic Test
Benchmark Test
No Test - Explain why

Type of change

Bug Fix (non-breaking change which fixes an issue)
New Feature (non-breaking change which adds functionality)
Breaking Change (fix or feature that could cause existing functionality not to work as expected)
Documentation Update
Refactoring
Performance Improvement
Other (please describe):

This change is

github-actions · 2025-12-05T10:41:45Z

Docker Image for PR

tag: pr-19072-e3377f5-1764931157

note: this image tag is only available for internal use.

github-actions · 2025-12-05T11:56:43Z

Docker Image for PR

tag: pr-19072-e3377f5-1764935655

note: this image tag is only available for internal use.

github-actions · 2025-12-05T13:09:29Z

ClickBench Report

hits: https://benchmark.databend.com/clickbench/pr/19072/19961084728/hits.html
tpch100: https://benchmark.databend.com/clickbench/pr/19072/19961084728/tpch100.html
tpch1000: https://benchmark.databend.com/clickbench/pr/19072/19961084728/tpch1000.html

forsaken628 · 2025-12-06T06:58:13Z

Sequential probing has a greater likelihood of being optimized for SIMD instructions. Or maybe the compiler isn't that smart yet?

dqhl76 · 2025-12-06T07:16:17Z

Sequential probing has a greater likelihood of being optimized for SIMD instructions. Or maybe the compiler isn't that smart yet?

I have saw the improvement from Q18 in tpch 1000 (120.03s -> 111.57s). I will perf for a flame graph to ensure that latter.

Here is my guess:

With pure linear probing (+1 each time), occupied slots tend to cluster. Once you hit such a cluster, the probing process need over a long run of consecutive occupied entries. If, instead, the next probe position is derived from the hash (i.e. more “random”), you break up these clusters. That may well hurt potential SIMD or prefetch optimisations, but it also shortens long probe chains on average.

(BTW, this idea is inspired from an optimisation PR in DuckDB. A related approach from SwissTable is to keep linear probing but increase the step size when you encounter several consecutive occupied slots. I haven’t tested that variant here yet)

forsaken628 · 2025-12-06T07:17:57Z

https://github.com/rust-lang/hashbrown/blob/master/src/raw/mod.rs#L2068

dqhl76 · 2025-12-06T07:19:19Z

https://github.com/rust-lang/hashbrown/blob/master/src/raw/mod.rs#L66-L79

forsaken628 · 2025-12-06T07:52:37Z

Later, we can replace the current Entry with the Group of SwissTable

improve

3bdbdc6

github-actions bot added the pr-refactor this PR changes the code base without new features or bugfix label Dec 5, 2025

dqhl76 marked this pull request as draft December 5, 2025 08:50

dqhl76 added the ci-cloud Build docker image for cloud test label Dec 5, 2025

dqhl76 added ci-benchmark-cloud Benchmark: run only cloud tests for tpch/hits and removed ci-cloud Build docker image for cloud test labels Dec 5, 2025

dqhl76 added 3 commits December 7, 2025 13:25

test: add a test to ensure cover all slots

1330904

chore: clean

770eb53

Merge branch 'main' into improve-5

2b317f3

dqhl76 marked this pull request as ready for review December 7, 2025 06:36

dqhl76 requested review from forsaken628 and zhang2014 December 7, 2025 06:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor: try reduce aggregate hash index cost on hot path #19072

refactor: try reduce aggregate hash index cost on hot path #19072

dqhl76 commented Dec 5, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Dec 5, 2025

Uh oh!

github-actions bot commented Dec 5, 2025

Uh oh!

github-actions bot commented Dec 5, 2025

Uh oh!

forsaken628 commented Dec 6, 2025

Uh oh!

dqhl76 commented Dec 6, 2025 •

edited

Loading

Uh oh!

forsaken628 commented Dec 6, 2025

Uh oh!

dqhl76 commented Dec 6, 2025

Uh oh!

forsaken628 commented Dec 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

refactor: try reduce aggregate hash index cost on hot path #19072

Are you sure you want to change the base?

refactor: try reduce aggregate hash index cost on hot path #19072

Conversation

dqhl76 commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Tests

Type of change

Uh oh!

github-actions bot commented Dec 5, 2025

Docker Image for PR

Uh oh!

github-actions bot commented Dec 5, 2025

Docker Image for PR

Uh oh!

github-actions bot commented Dec 5, 2025

ClickBench Report

Uh oh!

forsaken628 commented Dec 6, 2025

Uh oh!

dqhl76 commented Dec 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

forsaken628 commented Dec 6, 2025

Uh oh!

dqhl76 commented Dec 6, 2025

Uh oh!

forsaken628 commented Dec 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dqhl76 commented Dec 5, 2025 •

edited

Loading

dqhl76 commented Dec 6, 2025 •

edited

Loading