Skip to content

Conversation

@kevin-dp
Copy link
Contributor

@kevin-dp kevin-dp commented Jul 9, 2025

This PR replaces the existing distinct operator which was built on top of the reduce operator with a more efficient implementation. This new distinct operator supports an optional argument that is a function (value: T) => any to determine what to deduplicate on. For example, we can do:

// Get distinct users
usersInput.pipe(
  distinct()
)

// Get distinct users per country
usersInput.pipe(
  distinct(user => user.country)
)

Note that the output has the same type as the input, the function is only used to determine what to deduplicate on but it is not used to transform the values. This function is very useful for ts/db where we want to get distinct results based on the selected columns but we still want the entire selected rows to come out of the stream.

@samwillis
Copy link
Contributor

I'll take a proper look later when home, but looks good. Does this dedupe per-key? So to dedupe the all you key by a common key first?

@samwillis
Copy link
Contributor

Not a problem if it doesn't, we can always add a version that does if we want to later in.

@kevin-dp
Copy link
Contributor Author

kevin-dp commented Jul 9, 2025

I'll take a proper look later when home, but looks good. Does this dedupe per-key? So to dedupe the all you key by a common key first?

It dedupes by the entire value unless you pass an extractor function in which case it dedupes by the return value of that function. The extractor function pattern should be general enough to handle any case.

So, if you want to dedup by key you would do distinct(([key, _value]) => key). If you want to dedup by value (across keys) you can do distinct(([_key, value]) => value).

Copy link
Contributor

@samwillis samwillis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! :shipit:

@samwillis samwillis merged commit d6c4ed9 into main Jul 9, 2025
1 check passed
@samwillis samwillis deleted the kevin/distinct-operator branch July 9, 2025 15:39
cursor bot pushed a commit to samwillis/d2ts that referenced this pull request Jul 13, 2025
* Replace distinct operator atop reduce by a dedicated more efficient distinct operator

* Add additional unit tests for distinct operator

* Fix bug where distinct wasn't properly summing the multiplicites if an element occurs multiple times in the input stream

* Extend distinct with optional argument to determine what to deduplicate by.

* Formatting

* Small change to test

* changest

---------

Co-authored-by: Sam Willis <sam.willis@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants