Commit 727955d
committed
Optimize set operations with O(n+m) merge-walk algorithm
This replaces the previous O(n log n) and O(n * log m) implementations
of setUnion, setInter, and setDiff with efficient O(n+m) merge-walk
algorithms that exploit the fact that sets are already sorted.
Changes:
- setUnion: Changed from concat + sort + dedup to merge-walk
(was O((n+m) log(n+m)), now O(n+m))
- setInter: Changed from iterate + binary search to merge-walk
(was O(min(n,m) * log(max(n,m))), now O(n+m))
- setDiff: Changed from iterate + binary search to merge-walk
(was O(n * log m), now O(n+m))
- Added applyKeyF helper function for cleaner key function handling
- Added validateSet calls to setUnion for consistency with other ops
- Pre-sized ArrayBuffer allocations for better memory efficiency
The setMember function still uses binary search which is optimal for
single-element lookups.
These optimizations address the same performance issue as upstream
PR databricks#574, but avoid the O(n²) bug in that PR's uniqArr implementation
(which calls ArrayBuilder.result() on each iteration).
All changes respect the throwErrorForInvalidSets setting.1 parent d6bc48b commit 727955d
1 file changed
+107
-16
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1707 | 1707 | | |
1708 | 1708 | | |
1709 | 1709 | | |
| 1710 | + | |
| 1711 | + | |
| 1712 | + | |
| 1713 | + | |
1710 | 1714 | | |
1711 | 1715 | | |
| 1716 | + | |
1712 | 1717 | | |
1713 | | - | |
| 1718 | + | |
1714 | 1719 | | |
1715 | | - | |
| 1720 | + | |
1716 | 1721 | | |
1717 | | - | |
1718 | | - | |
| 1722 | + | |
| 1723 | + | |
| 1724 | + | |
| 1725 | + | |
| 1726 | + | |
| 1727 | + | |
| 1728 | + | |
| 1729 | + | |
| 1730 | + | |
| 1731 | + | |
| 1732 | + | |
| 1733 | + | |
| 1734 | + | |
| 1735 | + | |
| 1736 | + | |
| 1737 | + | |
| 1738 | + | |
| 1739 | + | |
| 1740 | + | |
| 1741 | + | |
| 1742 | + | |
| 1743 | + | |
| 1744 | + | |
| 1745 | + | |
| 1746 | + | |
| 1747 | + | |
| 1748 | + | |
| 1749 | + | |
| 1750 | + | |
| 1751 | + | |
| 1752 | + | |
| 1753 | + | |
| 1754 | + | |
| 1755 | + | |
| 1756 | + | |
1719 | 1757 | | |
1720 | 1758 | | |
1721 | 1759 | | |
| |||
1727 | 1765 | | |
1728 | 1766 | | |
1729 | 1767 | | |
1730 | | - | |
1731 | | - | |
1732 | | - | |
1733 | | - | |
1734 | | - | |
1735 | | - | |
1736 | | - | |
| 1768 | + | |
| 1769 | + | |
| 1770 | + | |
| 1771 | + | |
| 1772 | + | |
| 1773 | + | |
| 1774 | + | |
| 1775 | + | |
| 1776 | + | |
| 1777 | + | |
| 1778 | + | |
| 1779 | + | |
| 1780 | + | |
| 1781 | + | |
| 1782 | + | |
| 1783 | + | |
| 1784 | + | |
| 1785 | + | |
| 1786 | + | |
| 1787 | + | |
| 1788 | + | |
1737 | 1789 | | |
1738 | 1790 | | |
1739 | 1791 | | |
| |||
1746 | 1798 | | |
1747 | 1799 | | |
1748 | 1800 | | |
1749 | | - | |
1750 | 1801 | | |
1751 | | - | |
1752 | | - | |
1753 | | - | |
| 1802 | + | |
| 1803 | + | |
| 1804 | + | |
| 1805 | + | |
| 1806 | + | |
| 1807 | + | |
| 1808 | + | |
| 1809 | + | |
| 1810 | + | |
| 1811 | + | |
| 1812 | + | |
| 1813 | + | |
| 1814 | + | |
| 1815 | + | |
| 1816 | + | |
| 1817 | + | |
| 1818 | + | |
| 1819 | + | |
| 1820 | + | |
| 1821 | + | |
| 1822 | + | |
| 1823 | + | |
| 1824 | + | |
| 1825 | + | |
| 1826 | + | |
| 1827 | + | |
| 1828 | + | |
| 1829 | + | |
| 1830 | + | |
| 1831 | + | |
| 1832 | + | |
| 1833 | + | |
| 1834 | + | |
| 1835 | + | |
| 1836 | + | |
1754 | 1837 | | |
| 1838 | + | |
1755 | 1839 | | |
1756 | | - | |
1757 | 1840 | | |
1758 | 1841 | | |
1759 | 1842 | | |
| |||
1972 | 2055 | | |
1973 | 2056 | | |
1974 | 2057 | | |
| 2058 | + | |
| 2059 | + | |
| 2060 | + | |
| 2061 | + | |
| 2062 | + | |
| 2063 | + | |
| 2064 | + | |
| 2065 | + | |
1975 | 2066 | | |
1976 | 2067 | | |
1977 | 2068 | | |
| |||
0 commit comments