-
Notifications
You must be signed in to change notification settings - Fork 97
Description
In #5090 I got pretty good performance out of the concatenate_uniquely function for Strings, but it still doesn't beat the existing pathway:
------------------------------------------------------------------------- benchmark 'AK_strings_concat_unique': 8 tests --------------------------------------------------------------------------
Name (time in s) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
bench_strings_concat_unique[1-0.75] 2.1740 (1.0) 2.2838 (1.0) 2.2146 (1.0) 0.0473 (2.29) 2.1960 (1.0) 0.0762 (3.53) 1;0 0.4516 (1.0) 5 1
bench_strings_concat_unique[1-0.5] 2.3490 (1.08) 2.4030 (1.05) 2.3686 (1.07) 0.0207 (1.0) 2.3661 (1.08) 0.0216 (1.0) 1;0 0.4222 (0.93) 5 1
bench_strings_concat_unique[2-0.5] 2.4689 (1.14) 3.1955 (1.40) 2.8140 (1.27) 0.2964 (14.33) 2.7309 (1.24) 0.4795 (22.21) 2;0 0.3554 (0.79) 5 1
bench_strings_concat_unique[1-0.25] 2.5054 (1.15) 2.5831 (1.13) 2.5495 (1.15) 0.0307 (1.48) 2.5545 (1.16) 0.0464 (2.15) 2;0 0.3922 (0.87) 5 1
bench_strings_concat_unique[2-0.75] 2.5337 (1.17) 3.3380 (1.46) 2.9655 (1.34) 0.3200 (15.47) 3.0040 (1.37) 0.5063 (23.45) 2;0 0.3372 (0.75) 5 1
bench_strings_concat_unique[2-0.25] 2.5538 (1.17) 2.9718 (1.30) 2.7417 (1.24) 0.1828 (8.84) 2.6550 (1.21) 0.3081 (14.27) 2;0 0.3647 (0.81) 5 1
bench_strings_concat_unique[1-0.0] 2.5973 (1.19) 4.2031 (1.84) 3.0300 (1.37) 0.6656 (32.18) 2.7663 (1.26) 0.5717 (26.48) 1;1 0.3300 (0.73) 5 1
bench_strings_concat_unique[2-0.0] 2.8344 (1.30) 3.2725 (1.43) 2.9917 (1.35) 0.1685 (8.15) 2.9632 (1.35) 0.1819 (8.42) 1;0 0.3343 (0.74) 5 1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
This is on 100m strings (per locale, I think). But performance flips on lower amounts of strings - for 1m and 10m, concatenate_uniquely is better. Some things worth investigating - why is unique better for larger number of strings? Is there anything that can be done in Repartition to improve performance? Some ideas to get you started:
Repartition may do better on larger strings (meaning more than 16 bytes)
Maybe try something with cptrs in repartitionByLocaleStringArray to eliminate the for over the locales.
Try doing a local dedupe before repartitionByLocaleStringArray (possibly the same way as the final deduplication) or allowing the user to opt into that.
Get rid of myBytes[currByte..#numBytes] = valsA[firstStart..#numBytes]; (this is bulk transfer of bytes from the SegString into a local innerArray). Even if the string sizes are uniform and this copy is entirely local, it's still possibly a large copy. Instead, try to hash and repartition based on segStrings[i].values.a that are on the current locale already. Of course some strings will overlap the boundaries, but balancedSegStringRetrieval may give you some ideas on how to achieve this. This may introduce some overhead but at a large scale it could save a lot of time.
Maybe try a different sort - in dedupeLocalByHashSort, I just used the standard sort. unique uses a different sort. Try playing around with this and seeing what you get.
Once you're getting good performance here, take the strategies in Repartition and apply them to other functions!