feat: native single-hop Expand via CSR adjacency index (Phase 2, #159)#162
Open
jja725 wants to merge 10 commits into
Open
feat: native single-hop Expand via CSR adjacency index (Phase 2, #159)#162jja725 wants to merge 10 commits into
jja725 wants to merge 10 commits into
Conversation
Design spec for issue lance-format#159 Phase 2: wire the Phase 1 CsrIndex into a native single-hop Expand via custom DataFusion ExecutionPlan (CsrExpandExec topology + LanceTakeExec materialization), dense-ROWID id model, with fallback to the DataFusion join path for unsupported shapes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
7-task TDD plan implementing the approved design: generalize CSR builder, CsrExpandNode/Exec, LanceTakeNode/Exec + RowMaterializer, CsrExtensionPlanner/ CsrQueryPlanner, LanceNativePlanner lowering with fallback, and query.rs wiring of the LanceNative execution strategy with end-to-end parity tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…tch_with_columns Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 2 of #159: execute single-hop Cypher
Expandnatively using the Phase 1CsrIndexinstead of relationship-scan + two SQL joins. Implemented as custom DataFusion operators, with automatic fallback to the existing join path for anything not yet supported natively.This is a DuckPGQ-style relational engine + CSR acceleration integration:
CsrExpandExec— topology only: for each source row, look up neighbors in the CSR and emit one row per neighbor with the neighbor's row id appended.LanceTakeExec— materializes target node properties from those row ids via aRowMaterializer(in-memoryarrow::compute::takenow; a Lance-datasettakelands in Phase 4).CsrExpandNode/LanceTakeNode— logical extension nodes;CsrExtensionPlanner+CsrQueryPlannerbuild the CSR and materializer at physical-planning time.LanceNativePlanneroverrides onlyExpandlowering and delegates everything else toDataFusionPlanner, soExecutionStrategy::LanceNativeis always correct — it uses CSR when it can and joins otherwise.Design decisions
csr.neighbors(src_rowid) -> dst_rowids), mirroring how every Lance index works (key → row ids →take()to materialize). Generalizes to Lance stable row ids in Phase 4.take()(schema parity with the join path) rather than analyzing which are referenced.Full design and task breakdown:
docs/superpowers/specs/2026-06-22-csr-native-expand-operator-design.mdanddocs/superpowers/plans/2026-06-22-csr-native-expand-operator.md.Native vs. fallback
Served natively: exactly one single-hop
Expand, single relationship type, Outgoing/Incoming, no inline relationship/target property filters, no bound relationship variable. WrappingProject/Filter/Sort/Limit/Offset/Distinctrun as normal DataFusion operators on the native stream.Falls back to the DataFusion join path: variable-length / multi-hop, multiple relationship types, undirected, inline
{k:v}filters, bound relationship variable,Join,Unwind.Out of scope (later phases)
VariableLengthExpand, BFS/DFS/shortest-path operators (Phase 3)LanceDatasetMaterializer, namespace native path (Phase 4)Test plan
expand_batch,take_batch,InMemoryMaterializer, the CSR builder column generalization, and the planner native/fallback decision.tests/test_lance_native_expand.rs) assertingLanceNativereturns identical results toDataFusionfor: single-hopRETURN a.name, b.name, withWHERE b.age > 30, incoming direction, and a variable-length query (fallback).cargo test -p lance-graph(15 binaries, 0 failures);cargo clippy -p lance-graph --all-targetsclean.Known follow-ups (not blocking)
🤖 Generated with Claude Code