Add guided vector-first Arango graph retrieval by JonasReuter · Pull Request #2 · JonasReuter/graphrag

JonasReuter · 2026-04-26T19:26:09Z

Summary

Adds a new guided, vector-first ArangoDB graph retrieval path that avoids broad k-hop expansion.

The approach follows the concept we discussed:

use Arango vector search to find semantic seed entities first
run a best-first frontier traversal instead of 1..depth broad expansion
expand only top-M scored edges per frontier node
apply hard runtime budgets (max_expansions, max_frontier_size, max_edges_per_node, max_results)
score paths with vector relevance, relation/query hints, edge weight, target type, hub/degree penalty, depth decay, and a lightweight PPR-like mass boost
optionally retrieve community reports for the best graph region
keep legacy traverse_neighbors() and hybrid_search() intact for compatibility

Why

A fixed k-hop traversal can explode on high-degree entities. A node with 1,000 edges can produce ~1,000,000 two-hop candidates. This PR changes the retrieval primitive from "walk all hops" to "walk only promising paths within a strict budget".

The runtime is bounded primarily by:

seed_k + max_expansions * max_edges_per_node

instead of by:

degree ^ depth

New API

from graphrag_vectors import (
    GuidedArangoGraphRetriever,
    GuidedGraphRetrievalConfig,
    QueryGraphPlan,
)

retriever = GuidedArangoGraphRetriever(graph_store._db, graph_store.graph_name)
result = retriever.retrieve(
    query_vector=query_vector,
    query=query,
    plan=QueryGraphPlan.from_query_text(query),
    config=GuidedGraphRetrievalConfig(),
)

Also adds a query-layer helper:

from graphrag.query.input.retrieval.guided_graph import retrieve_guided_graph_context

Config additions

graph_store:
  guided_retrieval_enabled: true
  guided_seed_k: 12
  guided_max_depth: 3
  guided_max_edges_per_node: 8
  guided_max_expansions: 128
  guided_max_frontier_size: 256
  guided_max_results: 80
  guided_min_path_score: 0.05
  guided_depth_decay: 0.72
  guided_community_report_limit: 8
  guided_allow_vector_scan_fallback: false

Performance notes

Full vector scan fallback is disabled by default to protect query latency.
Edge expansion is performed as repeated bounded 1-hop top-M AQL calls, not variable-length traversal.
Hub penalty uses entity degree/rank signals to avoid generic high-degree nodes dominating retrieval.
The implementation returns stats including duration, seed count, expansions, path count, and maximum possible edge reads.

Testing

Not run locally from this environment. Please run:

uv run poe check
uv run poe test_unit

Recommended manual test: compare legacy hybrid_search() against guided retrieval on a high-degree seed and inspect result["stats"] for bounded expansion behavior.

JonasReuter added 6 commits April 26, 2026 21:20

Add guided vector-first Arango graph retriever

4cd3ca5

Harden guided Arango retriever AQL and fallback

9cae362

Export guided Arango graph retrieval API

12dcd1b

Add guided graph retrieval config fields

c601c6a

Add guided graph retrieval defaults

0a59d48

Add query retrieval adapter for guided Arango graph search

f54b652

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add guided vector-first Arango graph retrieval#2

Add guided vector-first Arango graph retrieval#2
JonasReuter wants to merge 6 commits intomainfrom
feature/budgeted-arango-graph-retrieval

JonasReuter commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JonasReuter commented Apr 26, 2026

Summary

Why

New API

Config additions

Performance notes

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant