perf: remove implicit ListViewArray rebuild during take and filter operations#8048
perf: remove implicit ListViewArray rebuild during take and filter operations#8048mhk197 wants to merge 13 commits into
ListViewArray rebuild during take and filter operations#8048Conversation
ListViewArray rebuild during take and filter operationsListViewArray rebuild during take and filter operations
c6a1940 to
95ae305
Compare
Polar Signals Profiling ResultsLatest Run
Powered by Polar Signals Cloud |
Merging this PR will improve performance by 19.67%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ⚡ | Simulation | chunked_varbinview_opt_canonical_into[(1000, 10)] |
225.1 µs | 188.1 µs | +19.67% |
Tip
Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.
Comparing mk/explicit-listview-compaction (cd7c97e) with develop (f852d72)
Benchmarks: PolarSignals ProfilingVortex (geomean): 1.052x ➖ datafusion / vortex-file-compressed (1.052x ➖, 0↑ 2↓)
|
File Sizes: PolarSignals ProfilingFile Size Changes (1 files changed, -0.0% overall, 0↑ 1↓)
Totals:
|
Benchmarks: FineWeb NVMeVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.927x ➖, 2↑ 0↓)
datafusion / vortex-compact (0.928x ➖, 3↑ 0↓)
datafusion / parquet (0.926x ➖, 2↑ 0↓)
duckdb / vortex-file-compressed (0.925x ➖, 3↑ 0↓)
duckdb / vortex-compact (0.972x ➖, 1↑ 0↓)
duckdb / parquet (0.936x ➖, 1↑ 0↓)
Full attributed analysis
|
File Sizes: FineWeb NVMeFile Size Changes (2 files changed, -0.0% overall, 0↑ 2↓)
Totals:
|
Benchmarks: TPC-H SF=1 on NVMEVerdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (1.004x ➖, 0↑ 0↓)
datafusion / vortex-compact (0.999x ➖, 0↑ 0↓)
datafusion / parquet (1.002x ➖, 1↑ 1↓)
datafusion / arrow (1.003x ➖, 0↑ 1↓)
duckdb / vortex-file-compressed (1.005x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.003x ➖, 0↑ 0↓)
duckdb / parquet (1.013x ➖, 1↑ 4↓)
duckdb / duckdb (1.001x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: TPC-H SF=1 on NVMEFile Size Changes (18 files changed, -0.0% overall, 0↑ 18↓)
Totals:
|
Benchmarks: TPC-DS SF=1 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.989x ➖, 2↑ 0↓)
datafusion / vortex-compact (0.992x ➖, 2↑ 1↓)
datafusion / parquet (1.008x ➖, 0↑ 3↓)
duckdb / vortex-file-compressed (0.998x ➖, 1↑ 1↓)
duckdb / vortex-compact (0.995x ➖, 1↑ 0↓)
duckdb / parquet (0.996x ➖, 0↑ 1↓)
duckdb / duckdb (1.000x ➖, 2↑ 0↓)
Full attributed analysis
|
File Sizes: TPC-DS SF=1 on NVMEFile Size Changes (48 files changed, -0.0% overall, 0↑ 48↓)
Totals:
|
Benchmarks: FineWeb S3Verdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.084x ➖, 0↑ 1↓)
datafusion / vortex-compact (0.985x ➖, 0↑ 0↓)
datafusion / parquet (1.009x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.007x ➖, 0↑ 1↓)
duckdb / vortex-compact (0.990x ➖, 0↑ 0↓)
duckdb / parquet (0.988x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: Statistical and Population GeneticsVerdict: No clear signal (low confidence) duckdb / vortex-file-compressed (0.995x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.010x ➖, 0↑ 1↓)
duckdb / parquet (0.990x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: Statistical and Population GeneticsFile Size Changes (2 files changed, -0.0% overall, 0↑ 2↓)
Totals:
|
Benchmarks: Random AccessVortex (geomean): 0.847x ✅ unknown / unknown (0.882x ✅, 19↑ 0↓)
|
Benchmarks: TPC-H SF=10 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.047x ➖, 0↑ 7↓)
datafusion / vortex-compact (1.056x ➖, 0↑ 8↓)
datafusion / parquet (0.994x ➖, 0↑ 3↓)
datafusion / arrow (0.979x ➖, 1↑ 3↓)
duckdb / vortex-file-compressed (1.001x ➖, 0↑ 3↓)
duckdb / vortex-compact (0.975x ➖, 0↑ 0↓)
duckdb / parquet (0.994x ➖, 0↑ 0↓)
duckdb / duckdb (0.975x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: TPC-H SF=10 on NVMEFile Size Changes (48 files changed, -0.0% overall, 0↑ 48↓)
Totals:
|
Benchmarks: Clickbench on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.003x ➖, 0↑ 0↓)
datafusion / parquet (0.990x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.000x ➖, 1↑ 2↓)
duckdb / parquet (0.998x ➖, 0↑ 0↓)
duckdb / duckdb (0.984x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: Clickbench on NVMEFile Size Changes (201 files changed, -0.0% overall, 0↑ 201↓)
Totals:
|
Benchmarks: TPC-H SF=1 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (0.995x ➖, 2↑ 4↓)
datafusion / vortex-compact (0.878x ➖, 3↑ 0↓)
datafusion / parquet (0.995x ➖, 0↑ 2↓)
duckdb / vortex-file-compressed (1.031x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.075x ➖, 0↑ 0↓)
duckdb / parquet (1.106x ➖, 0↑ 1↓)
Full attributed analysis
|
Benchmarks: CompressionVortex (geomean): 1.000x ➖ unknown / unknown (0.999x ➖, 7↑ 3↓)
|
Benchmarks: TPC-H SF=10 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (1.012x ➖, 0↑ 2↓)
datafusion / vortex-compact (0.935x ➖, 1↑ 0↓)
datafusion / parquet (1.004x ➖, 1↑ 1↓)
duckdb / vortex-file-compressed (0.975x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.976x ➖, 0↑ 0↓)
duckdb / parquet (1.025x ➖, 0↑ 0↓)
Full attributed analysis
|
f8b4f2e to
933bdc1
Compare
ad614a7 to
7490746
Compare
Signed-off-by: Matthew Katz <katz@spiraldb.com>
Signed-off-by: Matthew Katz <katz@spiraldb.com>
Signed-off-by: Matthew Katz <katz@spiraldb.com>
Signed-off-by: Matthew Katz <katz@spiraldb.com>
Signed-off-by: Matthew Katz <katz@spiraldb.com>
Signed-off-by: Matthew Katz <katz@spiraldb.com>
Signed-off-by: Matthew Katz <katz@spiraldb.com>
Signed-off-by: Matthew Katz <katz@spiraldb.com>
Signed-off-by: Matthew Katz <katz@spiraldb.com>
Signed-off-by: Matthew Katz <katz@spiraldb.com>
57908ab to
cd7c97e
Compare
joseph-isaacs
left a comment
There was a problem hiding this comment.
looks. good overall
did you do this for duckdb too? that is important. since that might be passed sparse element lists.
also a single definitive comment explaining what you did.
| .as_::<u64>() | ||
| .ok_or_else(|| vortex_err!("could not cast sum of sizes to u64"))?; | ||
|
|
||
| let estimate = (sizes_sum as f32 / n_elts as f32).clamp(0.0, 1.0); |
There was a problem hiding this comment.
how can this go above or below 0,1 maybe a panic here.
or debug assert
There was a problem hiding this comment.
I would prefer a debug assert here
There was a problem hiding this comment.
Elements: [0,1]
ListView: [[0,1], [1]]
----
estimate: 3/2
Can't go below 0 though ofc, so I can use explicit > 1 check and change instead of clamp
| } | ||
| } | ||
|
|
||
| pub fn should_rebuild(&self, exact: bool, ctx: &mut ExecutionCtx) -> VortexResult<bool> { |
There was a problem hiding this comment.
Like Joe said, this seems like a footgun. I think that the caller should just decide, and the REBUILD_DENSITY_THRESHOLD can just be DEFAULT_REBUILD_DENSITY_THRESHOLD or similar
Summary
Removes implicit rebuilds from
ListViewArraytakeandfiltercompute kernels, adds density-introspection methods to support deciding when to rebuild explicitly at materialization boundaries, and defers rebuilding until array export.Motivation. The previous code rebuilt the elements buffer eagerly on every
takeorfilterwhose row-fraction dropped belowREBUILD_DENSITY_THRESHOLD. In an execution tree liketake → take → ..., an eager mid-pipeline rebuild costs an allocation and a full copy of referenced ranges that the next operator may immediately sparsify away.The row-fraction heuristic was also inaccurate. It doesn't account for per-row size variance, unreferenced elements, and duplicate references.
Changes:
TakeReduce::takeandTakeExecute::takefilter_listviewcompute_referenced_elements_mask,compute_density, andestimate_densityListViewArray::should_rebuild(exact: bool): Currently unused but intended to gate an explicit compaction boundary in follow-up.API Changes
take/filterover a sparse or low-densityListViewArrayno longer allocates a compacted elements buffer. Downstream consumers must callrebuildandshould_rebuildexplicitly themselves.Testing
vortex-array/src/arrays/listview/tests/density.rswith multiple tests: