Closes #31. Add \iter\unique which returns an iterator with unique va…#51
Closes #31. Add \iter\unique which returns an iterator with unique va…#51alexkart wants to merge 4 commits intonikic:masterfrom
Conversation
src/iter.php
Outdated
| $hash = $hashFunction($value); | ||
| } | ||
|
|
||
| if (\in_array($hash, $hashSet, $strict)) { |
There was a problem hiding this comment.
this could be way more efficient with a keyed lookup and isset or array_key_exists
There was a problem hiding this comment.
It would work with hashes, but what if we use raw values and values are arrays, objects or nulls, and what about types checking?
There was a problem hiding this comment.
Yes, this implementation has quadratic time complexity and as such is a no-go. As @staabm said, the correct way to do this is to use the value as a key. Objects and other values are exactly why there is the $hashFunction (more typically $keyFunction). This also obviates the need for the $strict parameter.
There was a problem hiding this comment.
Is it a good idea to use serialize() to get a key for the default implementation (if hash function is not provided) to keep the ability for strict comparison?
$hashSet[serialize($value)] = '';There was a problem hiding this comment.
serialize is probably not the best solution, it could be costy for objects with large object graph, and the objects could change during the iteration making it fail to deduplicate them.
For objects I would do something like $hashSet[spl_object_hash($value)] = $value (storing value in the array prevents the object from being reassigned making it unique). The downside is that the objects will stay in memory for the duration of the iteration.
There was a problem hiding this comment.
I meant you shouldnt serialize scalars, sry
There was a problem hiding this comment.
@staabm how will you check the types then? $arr[5] and $arr['5'] will be the same keys.
| * uniqueness of the element | ||
| * @return \Iterator | ||
| */ | ||
| function unique($iterable, callable $hashFunction = null) { |
There was a problem hiding this comment.
you can set here the default value for $hasFunction callable $hashFunction = 'serialize'
There was a problem hiding this comment.
Good point, thanks
| _assertIterable($iterable, 'First argument'); | ||
| $hashSet = []; | ||
| foreach ($iterable as $key => $value) { | ||
| if ($hashFunction === null) { |
There was a problem hiding this comment.
This test could be moved out of the foreach loop.
Something like this:
$hashFunction = $hashFunction ?? 'serialize';
| $hash = $hashFunction($value); | ||
| } | ||
|
|
||
| if (isset($hashSet[$hash])) { |
|
Tell me if I'm wrong but this implementation goes against all advantages of using iterator (memory safe). Because in order to check the unicity we build up a big array in memory.... I don't see an implementation that would be valid though.... |
|
This specific operation is indeed not very optimal because it needs to go through the whole iterator. |
Closes #31 . Add \iter\unique which returns an iterator with unique values.