Skip to content

Commit 752818f

Browse files
authored
Merge branch 'main' into chore/upgrade-rust-2024-group-5
2 parents da88968 + 55a38d4 commit 752818f

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

56 files changed

+1468
-1387
lines changed

.github/workflows/audit.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ jobs:
4242
steps:
4343
- uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
4444
- name: Install cargo-audit
45-
uses: taiki-e/install-action@493d7f216ecab2af0602481ce809ab2c72836fa1 # v2.62.62
45+
uses: taiki-e/install-action@50708e9ba8d7b6587a2cb575ddaa9a62e927bc06 # v2.62.63
4646
with:
4747
tool: cargo-audit
4848
- name: Run audit check

.github/workflows/rust.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -421,7 +421,7 @@ jobs:
421421
sudo apt-get update -qq
422422
sudo apt-get install -y -qq clang
423423
- name: Setup wasm-pack
424-
uses: taiki-e/install-action@493d7f216ecab2af0602481ce809ab2c72836fa1 # v2.62.62
424+
uses: taiki-e/install-action@50708e9ba8d7b6587a2cb575ddaa9a62e927bc06 # v2.62.63
425425
with:
426426
tool: wasm-pack
427427
- name: Run tests with headless mode
@@ -724,7 +724,7 @@ jobs:
724724
- name: Setup Rust toolchain
725725
uses: ./.github/actions/setup-builder
726726
- name: Install cargo-msrv
727-
uses: taiki-e/install-action@493d7f216ecab2af0602481ce809ab2c72836fa1 # v2.62.62
727+
uses: taiki-e/install-action@50708e9ba8d7b6587a2cb575ddaa9a62e927bc06 # v2.62.63
728728
with:
729729
tool: cargo-msrv
730730

Cargo.lock

Lines changed: 46 additions & 38 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

benchmarks/README.md

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -832,3 +832,41 @@ Getting results...
832832
cancelling thread
833833
done dropping runtime in 83.531417ms
834834
```
835+
836+
## Sorted Data Benchmarks
837+
838+
### Data Sorted ClickBench
839+
840+
Benchmark for queries on pre-sorted data to test sort order optimization.
841+
This benchmark uses a subset of the ClickBench dataset (hits.parquet, ~14GB) that has been pre-sorted by the EventTime column. The queries are designed to test DataFusion's performance when the data is already sorted as is common in timeseries workloads.
842+
843+
The benchmark includes queries that:
844+
- Scan pre-sorted data with ORDER BY clauses that match the sort order
845+
- Test reverse scans on sorted data
846+
- Verify the performance result
847+
848+
#### Generating Sorted Data
849+
850+
The sorted dataset is automatically generated from the ClickBench partitioned dataset. You can configure the memory used during the sorting process with the `DATAFUSION_MEMORY_GB` environment variable. The default memory limit is 12GB.
851+
```bash
852+
./bench.sh data data_sorted_clickbench
853+
```
854+
855+
To create the sorted dataset, for example with 16GB of memory, run:
856+
857+
```bash
858+
DATAFUSION_MEMORY_GB=16 ./bench.sh data data_sorted_clickbench
859+
```
860+
861+
This command will:
862+
1. Download the ClickBench partitioned dataset if not present
863+
2. Sort hits.parquet by EventTime in ascending order
864+
3. Save the sorted file as hits_sorted.parquet
865+
866+
#### Running the Benchmark
867+
868+
```bash
869+
./bench.sh run data_sorted_clickbench
870+
```
871+
872+
This runs queries against the pre-sorted dataset with the `--sorted-by EventTime` flag, which informs DataFusion that the data is pre-sorted, allowing it to optimize away redundant sort operations.

0 commit comments

Comments
 (0)