0.9.0
Pre-release
Pre-release
DataFusion Comet 0.9.0 Changelog
This release consists of 139 commits from 24 contributors. See credits at the end of this changelog for more information.
Fixed bugs:
- fix: typo for
instrin fuzz testing #1686 (mbutrovich) - fix: Bucketed scan fallback for native_datafusion Parquet scan #1720 (mbutrovich)
- fix: Skip row index Spark SQL tests for native_datafusion Parquet scan #1724 (mbutrovich)
- fix: Check acquired memory when CometMemoryPool grows #1732 (wForget)
- fix: Fix data race in memory profiling #1727 (andygrove)
- fix: Enable some DPP Spark SQL tests #1734 (andygrove)
- fix: support literal null list and map #1742 (kazuyukitanimura)
- fix: get_struct field is incorrect when struct in array #1687 (comphead)
- fix: cast map types correctly in schema adapter #1771 (parthchandra)
- fix: correct schema type checking in native_iceberg_compat #1755 (parthchandra)
- fix: default values for native_datafusion scan #1756 (mbutrovich)
- fix: [native_scans] Support
CASE_SENSITIVEwhen reading Parquet #1782 (andygrove) - fix: cargo install tpchgen-cli in benchmark doc #1797 (zhuqi-lucas)
- fix: support
map_keys#1788 (comphead) - fix: fall back on nested types for default values #1799 (mbutrovich)
- fix: Re-enable Spark 4 tests on Linux #1806 (andygrove)
- fix: fallback to Spark scan if encryption is enabled (native_datafusion/native_iceberg_compat) #1785 (parthchandra)
- fix: native_iceberg_compat: move checking parquet types above fetching batch #1809 (mbutrovich)
- fix: translate missing or corrupt file exceptions, fall back if asked to ignore #1765 (mbutrovich)
- fix: Fix Spark SQL AQE exchange reuse test failures #1811 (coderfender)
- fix: Enable more Spark SQL tests #1834 (andygrove)
- fix: support
map_values#1835 (comphead) - fix: Handle case where num_cols == 0 in native execution #1840 (andygrove)
- fix: Fix shuffle writing rows containing null struct fields #1845 (Kontinuation)
- fix: Fall back to Spark for
RANGE BETWEENwindow expressions #1848 (andygrove) - fix: Remove COMET_SHUFFLE_FALLBACK_TO_COLUMNAR hack #1865 (andygrove)
- fix: support read Struct by user schema #1860 (comphead)
- fix: map parquet field_id correctly (native_iceberg_compat) #1815 (parthchandra)
- fix: cast_struct_to_struct aligns to Spark behavior #1879 (mbutrovich)
- fix: correctly handle schemas with nested array of struct (native_iceberg_compat) #1883 (parthchandra)
- fix: set RangePartitioning for native shuffle default to false #1907 (mbutrovich)
- fix: conflict between #1905 and #1892. #1919 (mbutrovich)
- fix: Add overflow check to evaluate of sum decimal accumulator #1922 (leung-ming)
- fix: Fix overflow handling when casting float to decimal #1914 (leung-ming)
- fix: Ignore a test case fails on Miri #1951 (leung-ming)
Performance related:
- perf: Add memory profiling #1702 (andygrove)
- perf: Add performance tracing capability #1706 (andygrove)
- perf: Add
COMET_RESPECT_PARQUET_FILTER_PUSHDOWNconfig #1936 (andygrove)
Implemented enhancements:
- feat: add jemalloc as optional custom allocator #1679 (mbutrovich)
- feat: support
array_repeat#1680 (comphead) - feat: More warning info for users #1667 (hsiang-c)
- feat: decode() expression when using 'utf-8' encoding #1697 (mbutrovich)
- feat: regexp_replace() expression with no starting offset #1700 (mbutrovich)
- feat: Improve performance tracing feature #1730 (andygrove)
- feat: Set/cancel with job tag and make max broadcast table size configurable #1693 (wForget)
- feat: Add support for
expm1expression fromdatafusion-sparkcrate #1711 (andygrove) - feat: Add config option for showing all Comet plan transformations #1780 (andygrove)
- feat: Support Type widening: byte → short/int/long, short → int/long #1770 (huaxingao)
- feat: Translate Hadoop S3A configurations to object_store configurations #1817 (Kontinuation)
- feat: Upgrade to official DataFusion 48.0.0 release #1877 (andygrove)
- feat: Add experimental auto mode for
COMET_PARQUET_SCAN_IMPL#1747 (andygrove) - feat: support RangePartitioning with native shuffle #1862 (mbutrovich)
- feat: Add support for signum expression #1889 (andygrove)
- feat: Add support to lookup map by key #1898 (comphead)
- feat: support array_max #1892 (drexler-sky)
- feat: pass ignore_nulls flag to first and last #1866 (rluvaton)
- feat: Implement ToPrettyString #1921 (andygrove)
- feat: Support hadoop s3a config in native_iceberg_compat #1925 (parthchandra)
- feat: rand expression support #1199 (akupchinskiy)
- feat: supports array_distinct #1923 (drexler-sky)
- feat:
autoscan mode should check for supported file location #1930 (andygrove) - feat: Encapsulate Parquet objects #1920 (huaxingao)
- feat: Change default value of
COMET_NATIVE_SCAN_IMPLtoauto#1933 (andygrove) - feat: Supports array_union #1945 (drexler-sky)
Documentation updates:
- docs: Add changelog for 0.8.0 #1675 (andygrove)
- docs: Add instructions on running TPC-H on macOS #1647 (andygrove)
- docs: Add documentation for accelerating Iceberg Parquet scans with Comet #1683 (andygrove)
- docs: Add note on setting
core.abbrevwhen generating diffs #1735 (andygrove) - docs: Remove outdated param in macos bench guide #1748 (ding-young)
- docs: Add instructions for running individual Spark SQL tests from sbt #1752 (coderfender)
- docs: Add documentation for native_datafusion Parquet scanner's S3 support #1832 (Kontinuation)
- docs: Add docs stating that Comet does not support reading decimals encoded in Parquet BINARY format #1895 (andygrove)
Other:
- chore: Start 0.9.0 development #1676 (andygrove)
- chore: Update viable crates #1677 (EmilyMatt)
- chore: match Maven plugin versions with Spark 3.5 #1668 (hsiang-c)
- chore: Remove fallback reason "because the children were not native" #1672 (andygrove)
- chore: Rename
scalarExprToPrototoscalarFunctionExprToProto#1688 (comphead) - chore: fix build errors #1690 (comphead)
- chore: Make Aggregate transformation more compact #1670 (EmilyMatt)
- chore: update dev/release/rat_exclude_files.txt #1689 (hsiang-c)
- chore: Move Comet rules into their own files #1695 (andygrove)
- chore: Remove fast encoding option #1703 (andygrove)
- chore: fix CI job name #1712 (hsiang-c)
- minor: Warn if memory pool is dropped with bytes still reserved #1721 (andygrove)
- chore: Correct memory acquired size in unified memory pool #1738 (zuston)
- chore: allow large errors for Clippy #1743 (comphead)
- chore: Refactor DataTypeSupport #1741 (andygrove)
- chore: More refactoring of type checking logic #1744 (andygrove)
- chore: Enable more complex type tests #1753 (andygrove)
- chore: Add
scanImplattribute toCometScanExec#1746 (andygrove) - chore: Prepare for DataFusion 48.0.0 #1710 (andygrove)
- Docs: Setup Comet on IntelliJ #1760 (coderfender)
- chore: Reenable nested types for CometFuzzTestSuite with int96 #1761 (mbutrovich)
- chore: Enable partial Spark SQL tests for
native_iceberg_compatscan #1762 (andygrove) - chore: [native_iceberg_compat / native_datafusion] Ignore Spark SQL Parquet encryption tests #1763 (andygrove)
- build: Ignore array_repeat test to fix CI issues #1774 (andygrove)
- chore: Upload crash logs if Java tests fail #1779 (andygrove)
- chore: Drop support for Java 8 #1777 (andygrove)
- chore: Bump arrow to 18.3.0 #1773 (Kontinuation)
- build: Stop running Comet's Spark 4 tests on Linux for PR builds #1802 (andygrove)
- Chore: Moved strings expressions to separate file #1792 (kazantsev-maksim)
- chore: Speed up "PR Builds" CI workflows #1807 (andygrove)
- chore: [native scans] Ignore Spark SQL test for string predicate pushdown #1768 (andygrove)
- chore: Bump DataFusion to git rev 2c2f225 #1814 (andygrove)
- Feat: support bit_count function #1602 (kazantsev-maksim)
- Chore: implement bit_not as ScalarUDFImpl #1825 (kazantsev-maksim)
- build: Specify -Dsbt.log.noformat=true in sbt CI runs #1822 (andygrove)
- chore: Use unique artifact names in Java test run #1818 (andygrove)
- minor: Refactor PhysicalPlanner::default() to avoid duplicate code #1821 (andygrove)
- Chore: implement bit_count as ScalarUDFImpl #1826 (kazantsev-maksim)
- chore: IgnoreCometNativeScan on a few more Spark SQL tests #1837 (mbutrovich)
- chore: Enable tests in RemoveRedundantProjectsSuite.scala related to issue #242 #1838 (rishvin)
- minor: Replace many instances of
checkSparkAnswerwithcheckSparkAnswerAndOperator#1851 (andygrove) - chore: Update documentation and ignore Spark SQL tests for known issue with count distinct on NaN in aggregate #1847 (andygrove)
- chore: Ignore Spark SQL WholeStageCodegenSuite tests #1859 (andygrove)
- chore: Upgrade to DataFusion 48.0.0-rc3 #1863 (andygrove)
- upgraded spark 3.5.5 to 3.5.6 #1861 (YanivKunda)
- build: Disable some rounding tests when miri is enabled #1873 (andygrove)
- chore: Enable Spark SQL tests for
native_iceberg_compat#1876 (andygrove) - chore: Enable more Spark SQL tests #1869 (andygrove)
- chore: refactor planner read schema tests #1886 (comphead)
- chore: Implement date_trunc as ScalarUDFImpl #1880 (leung-ming)
- Chore: implement datetime funcs as ScalarUDFImpl #1874 (trompa)
- minor: Improve testing of math scalar functions #1896 (andygrove)
- minor: Avoid rewriting join to unsupported join #1888 (andygrove)
- chore: Enable
native_iceberg_compatSpark SQL tests (for real, this time) #1910 (andygrove) - chore: rename makeParquetFileAllTypes to makeParquetFileAllPrimitiveTypes #1905 (parthchandra)
- chore: add a test case to read from an arbitrarily complex type schema #1911 (parthchandra)
- test: Trigger Spark 3.4.3 SQL tests for iceberg-compat #1912 (kazuyukitanimura)
- build: Fix conflict between #1910 and #1912 #1924 (andygrove)
- minor: fix kube/Dockerfile build failed #1918 (zhangxffff)
- chore: Improve reporting of fallback reasons for CollectLimit #1694 (andygrove)
- chore: move udf registration to better place #1899 (rluvaton)
- chore: Comet + Iceberg (1.8.1) CI #1715 (hsiang-c)
- chore: Introduce
exprHandlersmap in QueryPlanSerde #1903 (andygrove) - chore: Enable Spark SQL tests for auto scan mode #1885 (andygrove)
- Feat: support bit_get function #1713 (kazantsev-maksim)
- chore: Clippy fixes for Rust 1.88 #1939 (andygrove)
- Minor: Add unit tests for
ceil/floorfunctions #1728 (tlm365)
Credits
Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.
62 Andy Grove
16 Matt Butrovich
10 Oleks V
8 Parth Chandra
5 Kazantsev Maksim
5 hsiang-c
4 Kristin Cowalcijk
4 Leung Ming
3 B Vadlamani
3 drexler-sky
2 Emily Matheys
2 Huaxin Gao
2 KAZUYUKI TANIMURA
2 Raz Luvaton
2 Zhen Wang
1 Artem Kupchinskiy
1 Junfan Zhang
1 Qi Zhu
1 Rishab Joshi
1 Tai Le Manh
1 Yaniv Kunda
1 Zhang Xiaofeng
1 ding-young
1 trompa
Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.