You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Our native multi-hop traversal materializes intermediate results in flat (fully enumerated) form. expand_batch flattens the CSR adjacency immediately — for each input row it emits one output row per neighbor and take()s every carried column once per neighbor:
// crates/lance-graph/src/lance_native_planner/csr_expand.rsfor&n in csr.neighbors(src.value(row)){
parent_idx.push(row_u32);// repeat the parent row
neighbors.push(n);// once per neighbor}// ...then take() duplicates every carried column once per neighbor
For a chain like (a)-[:KNOWS]->(b)-[:LIKES]->(c), the a columns get duplicated once per (b, c) path, so intermediate size grows as O(N^k) for k hops. This is the classic many-to-many blowup.
Background
KuzuDB (and the factorized-database line of work it builds on) avoids this with factorized intermediate results: keep intermediates in a compressed, Cartesian-product form so projections and aggregations push through without materializing the explosion. See What is KuzuDB.
Why this fits lance-graph well
CSR already produces the factorized group for free: csr.neighbors(src) returns a contiguous slice, which is exactly the shape of an Arrow ListArray (offset array + values), i.e. a nested child group. Today we take that naturally-nested output and flatten it. A factorized expand simply keeps the nesting — emit (parent values once) + (list of neighbors) instead of (parent values × neighbors).
It also belongs entirely in LanceNativePlanner. DataFusion's RecordBatch model is flat-relational, so factorization lives in the native operator tree, not the DataFusion fallback. This reinforces the native-planner direction in #159.
Scope and caveats
Survives only inside a native-operator chain. Handing a batch back to a DataFusion operator (cross-table join, sort, filter on a cross-hop expression) requires unfolding to flat. Value is proportional to how long a native chain we can keep.
Payoff needs consumers that push through nested form — aggregations (count, EXISTS, collect), LIMIT, degree-style queries, heavily-filtered deep traversals. For RETURN a, b, c with no aggregation the final output is the flat product anyway; factorization still saves intermediate memory/CPU but the last step enumerates.
The problem
Our native multi-hop traversal materializes intermediate results in flat (fully enumerated) form.
expand_batchflattens the CSR adjacency immediately — for each input row it emits one output row per neighbor andtake()s every carried column once per neighbor:For a chain like
(a)-[:KNOWS]->(b)-[:LIKES]->(c), theacolumns get duplicated once per(b, c)path, so intermediate size grows as O(N^k) for k hops. This is the classic many-to-many blowup.Background
KuzuDB (and the factorized-database line of work it builds on) avoids this with factorized intermediate results: keep intermediates in a compressed, Cartesian-product form so projections and aggregations push through without materializing the explosion. See What is KuzuDB.
Why this fits lance-graph well
CSR already produces the factorized group for free:
csr.neighbors(src)returns a contiguous slice, which is exactly the shape of an ArrowListArray(offset array + values), i.e. a nested child group. Today we take that naturally-nested output and flatten it. A factorized expand simply keeps the nesting — emit(parent values once) + (list of neighbors)instead of(parent values × neighbors).It also belongs entirely in
LanceNativePlanner. DataFusion'sRecordBatchmodel is flat-relational, so factorization lives in the native operator tree, not the DataFusion fallback. This reinforces the native-planner direction in #159.Scope and caveats
count,EXISTS,collect),LIMIT, degree-style queries, heavily-filtered deep traversals. ForRETURN a, b, cwith no aggregation the final output is the flat product anyway; factorization still saves intermediate memory/CPU but the last step enumerates.VariableLengthExpand) in feat: CSR adjacency index for native graph traversal (inspired by DuckPGQ + icebug-format) #159.Proposed approach
ListArray) instead of flatteningFlatten/Unfoldoperator at the native → DataFusion boundarycount/EXISTSthat consume the nested form without unfoldingLIMITshort-circuit over nested groupsVariableLengthExpand)Related