Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .changeset/chilly-icons-design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
'@electric-sql/d2mini': patch
---

Introduce topKWithFractionalIndexBTree and orderByWithFractionalIndexBTree operators. These variants use a B+ tree which is more efficient on big collections as its time complexity is logarithmic.
3 changes: 2 additions & 1 deletion packages/d2mini/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@
},
"dependencies": {
"fractional-indexing": "^3.2.0",
"murmurhash-js": "^1.0.0"
"murmurhash-js": "^1.0.0",
"sorted-btree": "^1.8.1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5.8kb gzipped https://bundlephobia.com/package/sorted-btree@1.8.1

This is enough extra code weight (~24% increase to tanstack/db) that depending on where the crossover point ends up being, this could be an opt-in thing. I.e. only use if you have 50k+ items in a a collection.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's the idea. We want to do some initial benchmarking to see when the turnover point is between using the array version or the tree version. We could automatically switch between them based on the size of the collection.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok perfect, yeah that'd be easy with an async import 🚀

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KyleAMathews problem with using an async import is that it propagates to our API but d2mini is sync, i don't think we want to make it async. Not sure how to get around this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevin-dp we can treat it like a JIT optimization perhaps? If the first sync run is too slow/big, we load sorted-btree in the background and start using it when it's loaded.

I agree we shouldn't make this async.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like:

  • monitor the size of shapes
  • once a single shape reaches a certain size threshold we download/load the tree version of the operator
  • when starting a query, if one of the source collections is over a certain size, and we have already loaded the tree version of the operator we use that, if not then we don't.
  • additional optimisation would be restart an existing query, with the other operator, once it has loaded, but this seems less needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KyleAMathews yes something we could do later if need be. For now, i introduced an async loadBTree function that must be called before using the tree variant of the operator. That way, we can keep the operator sync.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like:

* monitor the size of shapes

* once a single shape reaches a certain size threshold we download/load the tree version of the operator

* when starting a query, if one of the source collections is over a certain size, and we have already loaded the tree version of the operator we use that, if not then we don't.

* additional optimisation would be restart an existing query, with the other operator, once it has loaded, but this seems less needed

Just-in-time data structures in the wild 😃

}
}
7 changes: 7 additions & 0 deletions packages/d2mini/src/indexes.ts
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,13 @@ export class Index<K, V> {
return [...valueMap.values()]
}

getMultiplicity(key: K, value: V): number {
const valueMap = this.#inner.get(key)
const valueHash = hash(value)
const [, multiplicity] = valueMap.get(valueHash)
return multiplicity
}

entries() {
return this.#inner.entries()
}
Expand Down
41 changes: 29 additions & 12 deletions packages/d2mini/src/operators/orderBy.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ import { map } from './map.js'
import { innerJoin } from './join.js'
import { consolidate } from './consolidate.js'

interface OrderByOptions<Ve> {
export interface OrderByOptions<Ve> {
comparator?: (a: Ve, b: Ve) => number
limit?: number
offset?: number
Expand Down Expand Up @@ -128,19 +128,11 @@ export function orderByWithIndex<
}
}

/**
* Orders the elements and limits the number of results, with optional offset and
* annotates the value with a fractional index.
* This requires a keyed stream, and uses the `topKWithFractionalIndex` operator to order all the elements.
*
* @param valueExtractor - A function that extracts the value to order by from the element
* @param options - An optional object containing comparator, limit and offset properties
* @returns A piped operator that orders the elements and limits the number of results
*/
export function orderByWithFractionalIndex<
export function orderByWithFractionalIndexBase<
T extends KeyValue<unknown, unknown>,
Ve = unknown,
>(
topK: typeof topKWithFractionalIndex,
valueExtractor: (
value: T extends KeyValue<unknown, infer V> ? V : never,
) => Ve,
Expand Down Expand Up @@ -181,7 +173,7 @@ export function orderByWithFractionalIndex<
],
] as KeyValue<null, [K, Ve]>,
),
topKWithFractionalIndex((a, b) => comparator(a[1], b[1]), {
topK((a, b) => comparator(a[1], b[1]), {
limit,
offset,
}),
Expand All @@ -194,3 +186,28 @@ export function orderByWithFractionalIndex<
)
}
}

/**
* Orders the elements and limits the number of results, with optional offset and
* annotates the value with a fractional index.
* This requires a keyed stream, and uses the `topKWithFractionalIndex` operator to order all the elements.
*
* @param valueExtractor - A function that extracts the value to order by from the element
* @param options - An optional object containing comparator, limit and offset properties
* @returns A piped operator that orders the elements and limits the number of results
*/
export function orderByWithFractionalIndex<
T extends KeyValue<unknown, unknown>,
Ve = unknown,
>(
valueExtractor: (
value: T extends KeyValue<unknown, infer V> ? V : never,
) => Ve,
options?: OrderByOptions<Ve>,
) {
return orderByWithFractionalIndexBase(
topKWithFractionalIndex,
valueExtractor,
options,
)
}
19 changes: 19 additions & 0 deletions packages/d2mini/src/operators/orderByBTree.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
import { KeyValue } from '../types.js'
import { OrderByOptions, orderByWithFractionalIndexBase } from './orderBy.js'
import { topKWithFractionalIndexBTree } from './topKWithFractionalIndexBTree.js'

export function orderByWithFractionalIndexBTree<
T extends KeyValue<unknown, unknown>,
Ve = unknown,
>(
valueExtractor: (
value: T extends KeyValue<unknown, infer V> ? V : never,
) => Ve,
options?: OrderByOptions<Ve>,
) {
return orderByWithFractionalIndexBase(
topKWithFractionalIndexBTree,
valueExtractor,
options,
)
}
Loading