graph/db: cross-version graph Store queries#10714
graph/db: cross-version graph Store queries#10714ellemouton wants to merge 10 commits intolightningnetwork:masterfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refactors the graph database layer to support cross-version gossip data. By introducing materialized mapping tables, the system can now automatically select the preferred version of nodes and channels based on defined priority rules. This change simplifies the API for callers and ensures that graph traversal methods consistently return the most relevant data, regardless of whether it was announced via gossip v1 or v2. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request refactors the graph Store interface to support cross-version operations, enabling node and channel traversals to work across gossip v1 and v2. It introduces preferred mapping tables in the SQL backend and PreferHighest fetch helpers to prioritize v2 data while falling back to v1. Feedback was provided to implement a feature fallback in the no-cache path for directed channel traversals, ensuring consistency with the FetchNodeFeatures logic by providing the best available feature set when a specific gossip version lacks data.
| } | ||
| } | ||
|
|
||
| toNodeCallback := func() route.Vertex { |
There was a problem hiding this comment.
In the no-cache path, we should implement a fallback for node features similar to the logic in ChannelGraph.FetchNodeFeatures. This ensures that if a node has v1 features but no v2 features, we still provide the v1 features for pathfinding even if the channel announcement is v2. This maintains consistency between the cache-enabled and cache-disabled code paths.
// Try v2 first, fall back to v1 if the v2 features are empty. This
// ensures consistency with FetchNodeFeatures and provides the best
// available feature set for the node.
preferredFeatures := featuresByVersion[gossipV2]
if preferredFeatures.IsEmpty() {
preferredFeatures = featuresByVersion[gossipV1]
}
toNodeCallback := func() route.Vertex {There was a problem hiding this comment.
I think the current behaviour is actually more correct here. Tying ToNodeFeatures to the channel's gossip version ensures the features match the protocol version of the channel advertisement. A v2 preferred channel should use v2 features, and a v1 preferred channel should use v1 features.
Using a single "best" feature set across all channels would mean a v1 channel could get v2 features assigned, which is a version mismatch.
The underlying ambiguity (empty features vs absent node for a given version) is a known limitation — the proper fix is to distinguish those two cases at the FetchNodeFeatures layer rather than papering over it here.
| cachedInPolicy.ToNodeFeatures = | ||
| featuresByVersion[lnwire.GossipVersion( | ||
| row.GraphChannel.Version, | ||
| )] |
There was a problem hiding this comment.
Use the pre-calculated preferredFeatures here to ensure the best available feature set is used for the node, regardless of the channel's gossip version.
| cachedInPolicy.ToNodeFeatures = | |
| featuresByVersion[lnwire.GossipVersion( | |
| row.GraphChannel.Version, | |
| )] | |
| cachedInPolicy.ToNodeFeatures = preferredFeatures |
32f201a to
652f923
Compare
652f923 to
24c9c8d
Compare
| GetChannelAndNodesBySCID(ctx context.Context, arg sqlc.GetChannelAndNodesBySCIDParams) (sqlc.GetChannelAndNodesBySCIDRow, error) | ||
| HighestSCID(ctx context.Context, version int16) ([]byte, error) | ||
| ListChannelsByNodeID(ctx context.Context, arg sqlc.ListChannelsByNodeIDParams) ([]sqlc.ListChannelsByNodeIDRow, error) | ||
| ListPreferredDirectedChannelsPaginated(ctx context.Context, arg sqlc.ListPreferredDirectedChannelsPaginatedParams) ([]sqlc.ListPreferredDirectedChannelsPaginatedRow, error) |
There was a problem hiding this comment.
ListNodesPaginated seems to be unused
Some other ones that could need a check:
GetNodesByIDsDeleteNodeGetZombieChannelsSCIDsGetPruneHashByHeightGetPruneEntriesForHeightsGetClosedChannelsSCIDsListChannelsWithPoliciesPaginatedGetChannelsByIDs
24c9c8d to
b66429f
Compare
PR Severity: CRITICAL
Critical (6 files):
High (4 files):
Low (1 file):
AnalysisThis PR introduces SQL migration 15 (000015_graph_preferred_lookups) adding new indexes/tables to the graph SQL schema, with corresponding sqlc-generated code and updated graph store implementations. Database migrations are always CRITICAL: they are irreversible in production and affect data integrity. Key concerns for reviewers:
To override, add a severity-override-{critical,high,medium,low} label. |
|
thanks @bitromortac - addressed 👍 |
| if errors.Is(err, ErrEdgeNotFound) || | ||
| errors.Is(err, ErrZombieEdge) { | ||
|
|
||
| continue |
There was a problem hiding this comment.
it seems like we could interfere with zombie resurrection via GetChannelByID in handleChanUpdate, to not being able to resurrect v1 chans
| // GetVersionsBySCID returns the list of gossip versions for which a | ||
| // channel with the given SCID exists in the database, ordered | ||
| // ascending. | ||
| GetVersionsBySCID(ctx context.Context, | ||
| chanID uint64) ([]lnwire.GossipVersion, error) | ||
|
|
||
| // GetVersionsByOutpoint returns the list of gossip versions for which | ||
| // a channel with the given funding outpoint exists in the database, | ||
| // ordered ascending. | ||
| GetVersionsByOutpoint(ctx context.Context, | ||
| op *wire.OutPoint) ([]lnwire.GossipVersion, error) | ||
|
|
There was a problem hiding this comment.
how will those be used? I'm just wondering if excluding zombie errors is ok for them
| } | ||
|
|
||
| // If this version has policies, return immediately. | ||
| if p1 != nil || p2 != nil { |
There was a problem hiding this comment.
maybe we could change his to an &&, but I'm not sure if that half-policy protection is needed?
| // that this intentionally ignores c.v — cross-version iteration is the desired | ||
| // behaviour for callers that enumerate graph topology. |
There was a problem hiding this comment.
yes, not super clean that this is on the versioned graph (non-blocking)
There was a problem hiding this comment.
yeah the reason was for the autopilot interface which made this tricky. the good news is that we can actually remove these in this PR. added a commit since we now have versions that work cross version
| WHERE c.version = $1 | ||
| AND (c.node_id_1 = $2 OR c.node_id_2 = $2); | ||
|
|
||
| -- name: ListPreferredNodeChannelsPaginated :many |
There was a problem hiding this comment.
I think we'd need to be careful to test this properly to not get any perf regressions in pathfinding (can do a round after the next iteration)
There was a problem hiding this comment.
ive removed this for now. But note: it isnt currently possible to spin LND up with a sql backend without the graph cache. So i think what this PR does does not affect pathfinding which would just always hit the graph cache
| } | ||
|
|
||
| return c.db.FetchNodeFeatures(ctx, lnwire.GossipVersion1, node) | ||
| // Try v2 first, fall back to v1 if the v2 features are empty. |
There was a problem hiding this comment.
it seems that there can be a discrepancy between the cache and the no-cache paths FetchNodeFeatures, say if v1={MPP} v2={}. Then we'd get {} for the cache and {MPP} for the no-cache path.
|
@bhandras: review reminder |
b66429f to
f74502d
Compare
When FetchChannelEdgesByID hits a zombie edge, it constructs the partial ChannelEdgeInfo it returns alongside ErrZombieEdge with a hard-coded ChannelID of zero instead of the actual channel ID that was looked up. Callers such as the gossiper's processZombieUpdate receive this struct and may use the ChannelID field; returning zero is incorrect and could mask bugs in downstream code. Pass the looked-up chanID through to NewV1Channel / NewV2Channel so the zombie ChannelEdgeInfo carries the correct ChannelID.
Add two precomputed mapping tables that track the "best" gossip version for each unique node (pub_key) and channel (SCID): - graph_preferred_nodes: pub_key -> node_id - graph_preferred_channels: scid -> channel_id Priority for nodes: v2 announced > v1 announced > v2 shell > v1 shell. Priority for channels: v2 with policies > v1 with policies > v2 > v1. These tables enable simple indexed-join queries for cross-version traversal (ForEachNode, ForEachChannel, ForEachNodeDirectedChannel) without expensive per-row COALESCE subqueries. The tables are populated from existing data during the migration and maintained by upsert/delete queries on every write path (added in the next commit).
f74502d to
3a6f03f
Compare
Add version-agnostic channel fetch helpers that choose the preferred gossip-version row for a logical channel: the highest version with policy data, falling back to the highest bare version. Implement the SQL path inside a single read transaction instead of calling SQLStore methods recursively, preserve zombie edge info for SCID lookups, and keep KV behavior as v1-only delegation. Add coverage for preferred selection, version listing, missing channels, missing outpoints, and zombie edge info.
Add UpsertPreferredNode and UpsertPreferredChannel calls to every Store write path so the preferred mapping tables stay consistent: - upsertSourceNode, upsertNode, maybeCreateShellNode, DeleteNode - insertChannel, updateChanEdgePolicy, DeleteChannelEdges - pruneGraphNodes CASCADE deletes on the underlying graph_nodes and graph_channels tables automatically clean up preferred entries when a version is removed.
Drop the GossipVersion parameter from ForEachNode and ForEachChannel. Both methods now iterate across all gossip versions, yielding one result per unique pub_key or SCID using the preferred mapping tables for pagination. Wire ChannelGraph.FetchChannelEdgesByID and FetchChannelEdgesByOutpoint through the PreferHighest variants. Add GetVersionsBySCID and GetVersionsByOutpoint wrappers to ChannelGraph.
Update ForEachNodeDirectedChannel and FetchNodeFeatures to work across gossip versions. ForEachNodeDirectedChannel now checks for a preferHighestNodeDirectedChanneler interface on the store (implemented by SQLStore) to stream cross-version directed channels in a single paginated query against the preferred-channel mapping table. The fallback path iterates v2 then v1, buffering results to deduplicate by SCID. FetchNodeFeatures tries v2 first, falling back to v1 when v2 features are empty. The cache population comment is updated to note that v1-then-v2 iteration order naturally gives v2 precedence via key collision overwrite.
VersionedGraph wraps ChannelGraph for callers that want to operate against a specific gossip version. After the cross-version refactor, the ForEachNode and ForEachChannel methods on VersionedGraph silently ignored c.v and iterated across all versions — a surprising behaviour for a wrapper whose whole purpose is version-scoping. Move the cross-version iteration to ChannelGraph (where it belongs) and drop the foot-gun overrides from VersionedGraph. *VersionedGraph continues to expose these via the embedded *ChannelGraph, but now they are explicitly methods of the version-agnostic type. Add ChannelGraph.ForEachNode mirroring ChannelGraph.ForEachChannel, and update the only non-embedded callsite (rpcserver describeGraph) to take ChannelGraph directly. The two remaining test helpers also switch to *ChannelGraph since they only ever wanted a node count.
SQLStore.ForEachNodeCached takes a gossip version and uses it correctly when paginating through nodes, but the inner ListChannelsForNodeIDs call hardcoded GossipVersion1 instead of forwarding the requested version. Calling ForEachNodeCached(ctx, GossipVersion2, ...) therefore returned v2 nodes paired with v1 channels — silently inconsistent data. The bug was latent because every existing caller happens to request v1, but it would surface as soon as any v2-scoped caller appears. Add a regression test that creates a v2-only channel between two nodes that exist under both versions and asserts that ForEachNodeCached(ctx, GossipVersion2, ...) reports the channel for each endpoint.
The in-memory graph cache is only ever disabled on the bbolt KV backend in production, so the no-cache fallback paths in ChannelGraph.ForEachNodeDirectedChannel, ChannelGraph.FetchNodeFeatures and VersionedGraph.GraphSession will only run against a v1-only store. The v2-then-v1 loop in FetchNodeFeatures is therefore dead code in production, and the comments framing v1 as a temporary choice are misleading. Collapse FetchNodeFeatures to a direct v1 call, drop the "version-scoped"/"for now" comment language, and trim the TestPreferredNodeTraversal cases that were exercising the now-removed v2 fallback. Rename the test to TestNoCacheNodeTraversal to reflect what it actually covers (the v1 KV path).
3a6f03f to
26ef5e2
Compare
Summary
This PR makes the graph
Storeinterface cross-version so thatForEachNode,ForEachChannel, andForEachNodeDirectedChannelwork across gossip v1 and v2, returning one preferred entry per pub_key/SCID. It also addsPreferHighestfetch helpers andGetVersionsqueries so callers can retrieve channels without knowing which gossip version announced them.Part of the gossip v2 epic: #10293
Spun off from: #10656
Design
Two new materialised mapping tables are introduced:
graph_preferred_nodes— one row per uniquepub_key, pointing at the bestgraph_nodesrow. Priority: v2 announced > v1 announced > v2 shell > v1 shell.graph_preferred_channels— one row per unique SCID, pointing at the bestgraph_channelsrow. Priority: v2 with policies > v1 with policies > v2 bare > v1 bare.These tables are maintained on every write path (
AddNode,AddChannelEdge,UpdateEdgePolicy,DeleteNode,DeleteChannelEdges,PruneGraphNodes) viaUpsertPreferredNode/UpsertPreferredChannelSQL queries, and seeded by a migration for existing data.Changes by commit
sqldb: add preferred-node and preferred-channel mapping tables— migration 000014 creating the tables and populating them from existing data.graph/db: add PreferHighest fetch methods and GetVersions queries— newStoreinterface methods:FetchChannelEdgesByIDPreferHighest,FetchChannelEdgesByOutpointPreferHighest,GetVersionsBySCID,GetVersionsByOutpoint. KV implementations delegate to v1.graph/db: wire preferred-table maintenance into write paths— callsUpsertPreferredNode/UpsertPreferredChannelon all SQL write paths.graph/db: make ForEachNode and ForEachChannel cross-version— removes theGossipVersionparameter fromForEachNodeandForEachChannelon theStoreinterface, replacing the underlying queries with preferred-table-backed paginated queries.graph/db: implement cross-version node traversal— addsForEachNodeDirectedChannelPreferHighestfor cache-disabled SQL path, updatesFetchNodeFeaturesto prefer v2 features with v1 fallback.docs: add release notePerformance
Benchmarked on Apple M1 Pro against a mainnet v1-only graph (16,216 nodes, 51,239 channels). No regressions observed — the preferred-table JOIN adds negligible overhead:
Test plan
TestPreferHighestAndGetVersions,TestPreferHighestForEachNode,TestPreferHighestForEachChannel,TestPreferHighestNodeTraversal,TestPreferHighestNodeDirectedChannelTraversal,TestDeleteNodePreferredRecomputationisSQLDBguard