Investigate further performance optimizations in `Repartition`

In #5090 I got pretty good performance out of the `concatenate_uniquely` function for Strings, but it still doesn't beat the existing pathway:
```
------------------------------------------------------------------------- benchmark 'AK_strings_concat_unique': 8 tests --------------------------------------------------------------------------
Name (time in s)                           Min               Max              Mean            StdDev            Median               IQR            Outliers     OPS            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
bench_strings_concat_unique[1-0.75]     2.1740 (1.0)      2.2838 (1.0)      2.2146 (1.0)      0.0473 (2.29)     2.1960 (1.0)      0.0762 (3.53)          1;0  0.4516 (1.0)           5           1
bench_strings_concat_unique[1-0.5]      2.3490 (1.08)     2.4030 (1.05)     2.3686 (1.07)     0.0207 (1.0)      2.3661 (1.08)     0.0216 (1.0)           1;0  0.4222 (0.93)          5           1
bench_strings_concat_unique[2-0.5]      2.4689 (1.14)     3.1955 (1.40)     2.8140 (1.27)     0.2964 (14.33)    2.7309 (1.24)     0.4795 (22.21)         2;0  0.3554 (0.79)          5           1
bench_strings_concat_unique[1-0.25]     2.5054 (1.15)     2.5831 (1.13)     2.5495 (1.15)     0.0307 (1.48)     2.5545 (1.16)     0.0464 (2.15)          2;0  0.3922 (0.87)          5           1
bench_strings_concat_unique[2-0.75]     2.5337 (1.17)     3.3380 (1.46)     2.9655 (1.34)     0.3200 (15.47)    3.0040 (1.37)     0.5063 (23.45)         2;0  0.3372 (0.75)          5           1
bench_strings_concat_unique[2-0.25]     2.5538 (1.17)     2.9718 (1.30)     2.7417 (1.24)     0.1828 (8.84)     2.6550 (1.21)     0.3081 (14.27)         2;0  0.3647 (0.81)          5           1
bench_strings_concat_unique[1-0.0]      2.5973 (1.19)     4.2031 (1.84)     3.0300 (1.37)     0.6656 (32.18)    2.7663 (1.26)     0.5717 (26.48)         1;1  0.3300 (0.73)          5           1
bench_strings_concat_unique[2-0.0]      2.8344 (1.30)     3.2725 (1.43)     2.9917 (1.35)     0.1685 (8.15)     2.9632 (1.35)     0.1819 (8.42)          1;0  0.3343 (0.74)          5           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
```
This is on 100m strings (per locale, I think). But performance flips on lower amounts of strings - for 1m and 10m, `concatenate_uniquely` is better. Some things worth investigating - why is `unique` better for larger number of strings? Is there anything that can be done in Repartition to improve performance? Some ideas to get you started:
Repartition may do better on larger strings (meaning more than 16 bytes)
Maybe try something with `cptr`s in `repartitionByLocaleStringArray` to eliminate the for over the locales.
Try doing a local dedupe before `repartitionByLocaleStringArray` (possibly the same way as the final deduplication) or allowing the user to opt into that.
Get rid of `myBytes[currByte..#numBytes] = valsA[firstStart..#numBytes];` (this is bulk transfer of bytes from the SegString into a local `innerArray`). Even if the string sizes are uniform and this copy is entirely local, it's still possibly a large copy. Instead, try to hash and repartition based on `segStrings[i].values.a` that are on the current locale already. Of course some strings will overlap the boundaries, but `balancedSegStringRetrieval` may give you some ideas on how to achieve this. This may introduce some overhead but at a large scale it could save a lot of time.
Maybe try a different sort - in `dedupeLocalByHashSort`, I just used the standard `sort`. `unique` uses a different sort. Try playing around with this and seeing what you get.

Once you're getting good performance here, take the strategies in `Repartition` and apply them to other functions!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate further performance optimizations in `Repartition` #5401

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Investigate further performance optimizations in Repartition #5401

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Investigate further performance optimizations in `Repartition` #5401