feat: add shortest path return to BFS and SSSP algorithms by longbinlai · Pull Request #574 · alibaba/neug

longbinlai · 2026-06-20T04:01:45Z

Summary

Add shortest path return to GDS BFS and SSSP algorithms, plus path serialization optimizations that benefit all path-returning queries.

Changes

Commit 1: feat(gds): add shortest path return to BFS and SSSP algorithms

path output column via YIELD clause (pure YIELD-based, no config options)
Predecessor tracking in BFS/SSSP/BFSPred/SSSPPred (O(V) memory)
Real edge data lookup from CSR in path reconstruction
Struct-wrapped DataType for kPath compatibility with type converter
Zero overhead when path is not YIELDed
All changes confined to extension/gds/ — no system code modified

Commit 2-3: perf: optimize path JSON serialization

Eliminate serialize→parse→copy string round-trip in convert_path_to_json (sink.cc)
Add direct rapidjson build helpers: build_vertex_json_value / build_edge_json_value
Backward compatible: convert_vertex_to_json / convert_edge_to_json unchanged for standalone callers

Commit 4: feat(gds): add path_properties configuration

Lightweight mode (default): Only _ID, _LABEL, PK for nodes; _ID, _LABEL, _SRC_ID, _DST_ID for edges
- Performance: ~0.54s on LDBC SNB SF10 (65K nodes, 1.9M edges, 62K paths)
Full mode (path_properties: 'full'): All vertex and edge properties
- Performance: ~1.26s (2.34x slower)
Thread-local encoding flag in sink.cc with public API in sink.h

Usage

-- Distance only (backward compatible, zero overhead)
CALL bfs('g', {source: '0'}) YIELD node, distance RETURN node.id, distance;

-- With path return (lightweight, default)
CALL bfs('g', {source: '0'}) YIELD node, distance, path RETURN node.id, distance, path;

-- With path return (full properties)
CALL bfs('g', {source: '0', path_properties: 'full'}) YIELD node, distance, path RETURN node.id, distance, path;

Testing

Suite	Tests	Result
GDS path return (test_gds_path.py)	21	✅ all pass
GDS path encoding modes	5	✅ all pass
LDBC SF10 path tests (test_gds_path_ldbc.py)	18	✅ all pass
Existing GDS tests (test_gds.py)	36	✅ all pass
LDBC Graphalytics conformance (test_graphalytics.py)	24	✅ all pass
Other test suites (test_db_cases, test_query, etc.)	300+	✅ no regressions

Performance (LDBC SNB SF10: 65K nodes, 1.9M edges, 62K paths)

Configuration	Time	vs Baseline
BFS no path	0.025s	—
BFS + path (lightweight)	0.54s	6.5x from no-path
BFS + path (full)	1.26s	2.34x from lightweight

Files Changed

extension/gds/ — 12 files (path return implementation + configuration)
include/neug/execution/common/operators/retrieve/sink.h — encoding mode API
src/execution/common/operators/retrieve/sink.cc — serialization optimization
tools/python_bind/tests/ — 3 new test files, 21 new tests

Add `path` output column to BFS and SSSP GDS algorithms, allowing users to retrieve the actual shortest path (vertex + edge sequence) via the standard Cypher YIELD clause. ## Cypher API ```cypher -- Distance only (backward compatible, zero overhead) CALL bfs('g', {source: '0'}) YIELD node, distance RETURN node.id, distance; -- With path return (path computation triggered by YIELDing path) CALL bfs('g', {source: '0'}) YIELD node, distance, path RETURN node.id, distance, path; ``` ## Key Design Decisions - **Pure YIELD-based API**: no config options; path computation only happens when the `path` column is explicitly YIELDed - **Zero overhead when not requested**: predecessor array is not allocated when path is not in the YIELD clause - **Native kPath type**: returns standard NeuG Path objects with real edge data pointers looked up from the CSR graph view, supporting all Cypher path functions (nodes(), relationships(), length()) - **Struct-wrapped DataType**: the path column uses a properly constructed StructTypeInfo (with _NODES and _RELS fields) to match the type converter's expectations, via a bind-time wrapper ## Implementation - Adds `predecessors_[]` array to BFS/BFSPred/SSSP/SSSPPred (O(V) memory) - BFS parallel: predecessor write inside CAS-success branch (no extra sync) - SSSP parallel: predecessor write after successful distance relaxation - Sequential variants (pred): plain writes, no atomics needed - Path reconstruction in sink() walks predecessor chain, looks up real edge data from CSR, builds PathColumn via PathColumnBuilder - New `path_utils.h` with `reconstruct_path()`, `build_path_from_chain()`, `buildPathDataType()`, and `wrapTableBindFuncWithPathFix()` ## Testing - 16 new path return tests (test_gds_path.py): basic, backward compat, predicates, custom graphs, edge cases - All 76 existing tests pass (24 Graphalytics + 36 GDS + 16 path) - LDBC Graphalytics conformance verified with and without path return

The convert_path_to_json function was serializing each vertex/edge to a JSON string, then parsing it back into a rapidjson Document, then copying it into the parent array — three operations where one suffices. Add build_vertex_json_value and build_edge_json_value helpers that construct rapidjson objects directly into the provided allocator, and have convert_path_to_json use them instead of the string-based convert_vertex_to_json / convert_edge_to_json. The original string-returning functions are kept unchanged for their standalone callers in add_column (kVertex/kEdge cases). Measured on LDBC SNB SF10 (65K nodes, 1.9M edges, 62K paths): BFS with path return: 1.63s → 1.22s (25% faster)

…ncoding Two optimizations to convert_path_to_json in sink.cc: 1. Direct rapidjson build: replace the serialize→parse→copy string round-trip with build_vertex_json_value / build_edge_json_value helpers that construct rapidjson objects directly into the allocator. 2. Lightweight path encoding: path output now encodes only _ID, _LABEL, and PK for nodes (plus _SRC_ID, _DST_ID for edges), skipping all non-PK property lookups. Users can retrieve full node/edge details with a separate MATCH query when needed. Measured on LDBC SNB SF10 (65K nodes, 1.9M edges, 62K paths, 3-round avg): BFS yield+return path: 1.63s → 0.50s (3.3x faster, ↓ 70%) Breakdown of improvement: - String round-trip elimination: 1.63s → 1.31s (↓ 20%) - Lightweight encoding (skip property I/O): 1.31s → 0.50s (↓ 62%) All 94 GDS tests pass. No regressions in other test suites.

…path encoding Add configurable path encoding mode to control performance vs completeness: - **Lightweight mode (default)**: Only encodes structural info (_ID, _LABEL, PK for nodes; _ID, _LABEL, _SRC_ID, _DST_ID for edges) - Performance: ~0.54s on LDBC SNB SF10 (65K nodes, 1.9M edges, 62K paths) - **Full mode**: Encodes all vertex and edge properties - Performance: ~1.26s (2.34x slower than lightweight) Usage: CALL bfs('graph', {source: '0'}) -- lightweight (default) CALL bfs('graph', {source: '0', path_properties: 'full'}) -- full Implementation: - Add thread-local flag in sink.cc to control encoding mode - Add set_path_full_encoding()/get_path_full_encoding() API in sink.h - Add configure_path_encoding() helper in path_utils.h - Parse path_properties option in bfs.cc and sssp.cc - Update build_vertex_json_value() and build_edge_json_value() to check flag - Add 5 new tests in test_gds_path.py::TestPathEncodingModes - Update spec document with configuration details All 57 GDS tests pass. No regressions in other test suites.

CLAassistant · 2026-06-20T04:01:52Z

All committers have signed the CLA.

- Add 'Shortest Path Return' section to doc/source/extensions/load_gds.md - Document path_properties option (lightweight/full modes) - Update BFS and SSSP sections with path examples - Update Algorithm Summary table - Remove technical spec file (not for users)

….1 + clang-format 10.0.1) - Remove build_vertex_json_value_light and build_edge_json_value_light (defined but not used, causes -Werror=unused-function in CI) - Re-run isort with pinned version 5.10.1 (CI uses 5.10.1, not 6.x) - Re-run black on Python test files

…TCH p = ... queries The previous lightweight default broke C++ tests that use MATCH p = (a)-[*]->(b) queries, which expect full property encoding. This commit: - Changes default path encoding to 'full' mode (backward compatible) - Users can opt into 'lightweight' mode via path_properties option for better performance - Updates GDS BFS/SSSP algorithms to use 'full' as default - Updates tests to verify full mode is default - Updates documentation to reflect the change

- Reorder BFS examples: default full path first, explicit full second, lightweight last as performance optimization - Update performance recommendations to describe full mode as default and lightweight as opt-in for structure-only use cases - Remove internal design spec (extension/gds/docs/gds-shortest-path-spec.md) Closes alibaba#575

… doc links - Fix project_graph() syntax (was missing triplet map format) - Remove reference to non-existent docs/gds-shortest-path-spec.md - Point to existing doc/source/extensions/load_gds.md for full docs - List all 9 algorithms in a concise table - Fix build commands to use repo root Makefile

liulx20 · 2026-06-22T02:53:44Z

              if (__atomic_compare_exchange_n(&distances_[dst], &expected,
                                              level, false, __ATOMIC_RELAXED,
                                              __ATOMIC_RELAXED)) {
+                if (return_path_) {


I'm not sure whether there are multi-threading concurrency issues here. A more reasonable approach would be: after the BFS completes, re-traverse from each node—for example, starting from nodes at depth d to find their neighbors at depth d-1, continuing until reaching the target node.

Thanks for pointing this out. You are right that the dense pull mode had a race condition on predecessors_.

Fix applied (commit 3dbc758): In the dense pull mode, I added an atomic CAS on distances_[dst] before writing predecessors_[dst]. The flow is now:

Check if the vertex is unvisited (distances_[dst] == max)

Scan neighbors for a predecessor at level - 1

If reachable, CAS distances_[dst] from max to level — only the winning thread writes predecessors_[dst]

This ensures only one thread can claim each vertex and set its predecessor, eliminating the race. The sparse push mode was already safe (CAS on distance was already there), so no change was needed there.

Good suggestion! We've adopted exactly this approach in the latest commit. Predecessor arrays are now completely removed from BFS computation. Paths are reconstructed post-hoc in sink() by walking the distance array: for vertex v at distance d, scan neighbors to find one at distance d-1, repeat until reaching source. This eliminates the concurrency hazard entirely — no CAS needed, no predecessor writes during parallel computation. The dense pull mode has been reverted to the original simple non-atomic distances_[dst] = level.

liulx20 · 2026-06-22T02:56:08Z

      double cand = dist + weight;
      if (distances_[w] < 0 || cand < distances_[w]) {
        distances_[w] = cand;
+        if (return_path_) {


Similar to BFS, it may be difficult to trace back predecessor nodes in SSSP. A simple solution is to implement a single-threaded version to maintain the predecessors.

consider routing the case that returns a path through the predicate branch, since it is inherently single-threaded.

Good point. Adopted your suggestion — when path return is requested (return_path=true), the dispatcher in sssp.cc now routes to SSSPPred (single-threaded Dijkstra) regardless of whether predicates are present. This completely avoids the concurrency hazard of maintaining predecessor chains in the parallel SSSP, where iterative relaxation by multiple threads makes predecessor tracking non-trivial.

Exactly what I did — the dispatcher in sssp.cc now checks return_path and routes to the predicate branch (SSSPPred, single-threaded Dijkstra) when path return is requested. This is inherently safe since there is no concurrent access to the predecessor array. All 95 Python tests pass.

Addressed. SSSPPred no longer stores predecessors during computation. Path reconstruction is done post-hoc in sink() by walking the distance array: for vertex v with distance D, find a neighbor u where dist(u) + weight(u→v) == dist(v). This is single-threaded and avoids the difficulty of maintaining predecessor chains during parallel relaxation.

Yes, we route return_path cases through SSSPPred (single-threaded Dijkstra) as suggested. SSSPPred now also uses post-hoc path reconstruction instead of a predecessor array, so the routing is kept for the predicate case and the path return case. The parallel SSSP is only used when no predicates and no path return.

Address review comments from liulx20 on PR alibaba#574 regarding multi-threading concurrency issues in predecessor tracking. BFS (bfs_impl.cc): - Dense pull mode: use atomic CAS for distance update to ensure only one thread writes predecessors_[dst], eliminating the race where multiple threads could simultaneously claim the same unvisited vertex and overwrite each other predecessor entry. SSSP (sssp.cc): - Route path return through SSSPPred (single-threaded Dijkstra) when return_path=true, regardless of whether predicates are present. This avoids the inherent difficulty of maintaining correct predecessor chains in the parallel SSSP iterative relaxation. Also fix tarfile DeprecationWarning in io_utils.py (filter=data).

This reverts commit 91039e0.

Add test_gds_concurrency.py with 19 tests validating the concurrency fix (3dbc758) on a 20K-vertex graph that triggers multi-threaded parallel_for and dense pull mode. Tests verify BFS distance determinism, path validity (correct length, valid edges, no cycles, correct endpoints), and SSSP full determinism (single-threaded SSSPPred routing). Move standalone benchmark scripts from tools/python_bind/tests/ to benchmark/: - bench_gds_path.py (was tests/bench_gds_path.py) - bench_gds_algorithms.py (was tests/test_gds_benchmark.py) - bench_gds_performance.py (was tests/test_gds_performance.py) - bench_wcc_comparison.py (was tests/test_wcc_comparison.py) Add test_gds_path.py and test_gds_concurrency.py to the extension test CI workflow.

Remove bench_gds_algorithms.py, bench_gds_performance.py, and bench_wcc_comparison.py from benchmark/ — only bench_gds_path.py should be moved. Apply isort+black formatting to test_gds_concurrency.py.

…rking

Keep concurrency as the last option consistently across all algorithm tables.

liulx20 · 2026-06-22T07:06:25Z

+
+from neug.database import Database
+
+DATA_DIR = os.path.expanduser("~/Downloads/social_network_10_interactive")


Should we avoid using absolute paths here?

Fixed. Replaced the hardcoded ~/Downloads/... path with an environment variable NEUG_LDBC_DATA_DIR. The test auto-skips when the variable is not set.

liulx20 · 2026-06-22T07:23:41Z

+// - lightweight: only encode _ID, _LABEL, PK for vertices; structural info for
+// edges
+// GDS extension can call set_path_full_encoding(false) for lightweight mode
+static thread_local bool path_full_encoding_enabled = true;


What is the scope of this, and can it be declared outside of sink.cc?

this variable now affects the sink behavior of other queries, which is unreasonable.

The global variable has been completely removed. The encoding mode is now carried per-column: PathColumnBuilder has a set_full_encoding(bool) setter, PathColumn carries the flag, and sink.cc's add_column() reads it from the column when serializing. GDS sink() methods receive full_encoding as a parameter and set it on the builder. Non-GDS path columns default to true (full encoding), so other queries are never affected.

Fully resolved. The static thread_local bool path_full_encoding_enabled variable has been completely removed from sink.cc. The encoding mode is now carried as a per-column member on PathColumn/PathColumnBuilder, so GDS path queries with path_properties: 'lightweight' only affect their own path column. Other queries' path columns always default to full encoding and are never affected.

liulx20 · 2026-06-22T07:27:36Z

      {"node", common::DataTypeId::kVertex},
-      {"distance", common::DataTypeId::kDouble}};
+      {"distance", common::DataTypeId::kDouble},
+      {"path", common::DataTypeId::kPath}};


The function declaration here returns the column for the path, yet it doesn't return anything when return path is false, which seems odd. Is there a better way to handle this?

The 3-column output declaration (node, distance, path) represents the function's schema — the maximum set of columns the function CAN produce. The YIELD clause selects which columns are actually returned. When the user doesn't YIELD path, return_path is false and the path column is simply not populated in the output Context. This is standard Cypher CALL behavior — the output columns declare what's available, not what's always produced. We kept this design as changing it would require modifications to the function binding infrastructure.

liulx20 · 2026-06-22T07:37:09Z

  void compute();
-  void sink(execution::Context& ctx, int node_alias, int distance_alias);
+  void sink(execution::Context& ctx, int node_alias, int distance_alias,
+            int path_alias = -1);


Actually, -1 is a valid alias, so we probably shouldn't use -1 to indicate whether the path should be output.

Fixed. The path_alias >= 0 check has been removed. All sink() methods now use return_path_ as the sole indicator for whether to build the path column. When return_path_ is true, path_alias is always valid (set by the binder).

liulx20 · 2026-06-22T07:44:54Z

    SSSP sssp(graph, sssp_input.vertex_label, sssp_input.edge_label, source_vid,
              sssp_input.directed, sssp_input.edge_weight,
-              sssp_input.concurrency);
+              sssp_input.concurrency, sssp_input.return_path);


Since the else branch does not handle return_path, the logic related to return_path can be removed.

Done. The else branch no longer passes return_path or path_alias to the parallel SSSP. The return_path_ and predecessors_ members have been completely removed from the SSSP class, along with the dead if (return_path_) branches in compute() and sink(). The parallel SSSP constructor no longer takes a return_path parameter.

Address review comment from liulx20: use NEUG_LDBC_DATA_DIR environment variable instead of hardcoded ~/Downloads path. Test skips when the variable is not set.

…rays Address liulx20's review comments on PR alibaba#574: - Remove predecessor arrays from BFS, BFSPred, SSSP, SSSPPred. Paths are now reconstructed post-hoc in sink() by walking the distance array backward: for vertex v at distance d, find a neighbor u at distance d-1. This eliminates the concurrency hazard entirely (path reconstruction is single-threaded). - BFS dense pull mode: revert CAS to simple non-atomic write (CAS was only needed to protect predecessor writes, which are now gone). - SSSP parallel: remove return_path_ and predecessors_ entirely (parallel SSSP never produces paths; path return routes through SSSPPred). - Remove configure_path_encoding() wrapper; call set_path_full_encoding() directly in bfs.cc/sssp.cc and reset to true (default) after sink() to prevent the thread-local flag from leaking to other queries. - Use return_path_ instead of path_alias >= 0 as the path output indicator (-1 is a valid alias per liulx20's comment). - Remove PlainPredecessorAccessor; reconstruct_path() now accepts a callable for finding predecessors on-the-fly. Performance verified on Graph500-23 (4.6M vertices, 129M edges): BFS with path: 38.2s (original) → 37.1s (refactor) SSSP with path: 47.4s (original) → 46.0s (refactor) No regression; slight improvement from removed CAS overhead.

Address liulx20's comment: "this variable now affects the sink behavior of other queries, which is unreasonable." Remove the thread_local global variable path_full_encoding_enabled from sink.cc. Instead, the encoding mode is carried as a member of PathColumn/PathColumnBuilder: - PathColumnBuilder::set_full_encoding(bool) sets the mode - PathColumnBuilder::finish() propagates it to PathColumn - PathColumn::full_encoding() reads it - sink.cc's add_column() reads it from the column and passes to convert_path_to_json() GDS sink() methods receive full_encoding as a parameter and set it on the PathColumnBuilder. Non-GDS path columns default to true (full encoding), so other queries are never affected.

longbinlai added 4 commits June 20, 2026 09:40

longbinlai changed the title ~~feat(gds): add shortest path return to BFS and SSSP algorithms~~ feat: add shortest path return to BFS and SSSP algorithms Jun 20, 2026

longbinlai added 3 commits June 20, 2026 12:22

style: fix code formatting (clang-format + black + flake8)

f827f9a

docs: add GDS extension README with path return documentation

828429e

longbinlai force-pushed the feat/gds-shortest-path-return branch from 9b83082 to 0262596 Compare June 20, 2026 04:54

longbinlai added 2 commits June 20, 2026 13:40

fix format

91039e0

longbinlai force-pushed the feat/gds-shortest-path-return branch from f394613 to a59f3d6 Compare June 20, 2026 08:54

longbinlai added 2 commits June 20, 2026 17:59

style: reformat test_gds_path.py with black

1ce482c

longbinlai mentioned this pull request Jun 20, 2026

GDS BFS/SSSP path return: default to full encoding & documentation cleanup #575

Open

longbinlai force-pushed the feat/gds-shortest-path-return branch from f766964 to 4a1729e Compare June 21, 2026 03:54

longbinlai requested a review from liulx20 June 22, 2026 02:11

longbinlai added 2 commits June 22, 2026 10:26

Merge branch 'alibaba:main' into feat/gds-shortest-path-return

826451b

liulx20 reviewed Jun 22, 2026

View reviewed changes

longbinlai added 6 commits June 22, 2026 11:59

Revert "fix format"

6619ed7

This reverts commit 91039e0.

fix: revert extra benchmark moves and apply Python formatting

434552e

Remove bench_gds_algorithms.py, bench_gds_performance.py, and bench_wcc_comparison.py from benchmark/ — only bench_gds_path.py should be moved. Apply isort+black formatting to test_gds_concurrency.py.

chore: remove bench_gds_path.py — test datasets too small for benchma…

f454e75

…rking

docs: move path_properties before concurrency in BFS/SSSP option tables

9a4a8bc

Keep concurrency as the last option consistently across all algorithm tables.

liulx20 reviewed Jun 22, 2026

View reviewed changes

longbinlai added 3 commits June 22, 2026 15:58

fix: replace hardcoded path with env var in test_gds_path_ldbc.py

996feaf

Address review comment from liulx20: use NEUG_LDBC_DATA_DIR environment variable instead of hardcoded ~/Downloads path. Test skips when the variable is not set.


		from neug.database import Database

		DATA_DIR = os.path.expanduser("~/Downloads/social_network_10_interactive")

Conversation

longbinlai commented Jun 20, 2026

Summary

Changes

Commit 1: feat(gds): add shortest path return to BFS and SSSP algorithms

Commit 2-3: perf: optimize path JSON serialization

Commit 4: feat(gds): add path_properties configuration

Usage

Testing

Performance (LDBC SNB SF10: 65K nodes, 1.9M edges, 62K paths)

Files Changed

Uh oh!

CLAassistant commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

liulx20 Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CLAassistant commented Jun 20, 2026 •

edited

Loading

liulx20 Jun 22, 2026 •

edited

Loading