graph: fix various races #10420

ellemouton · 2025-12-04T09:47:51Z

fixes race found in this build

See commit messages for more details

The PubKey method had a race condition where concurrent calls could all pass the nil check and race to write to the cached pubKey field. This is a classic check-then-act race. Remove the caching entirely to fix the race. The overhead of parsing a public key is minimal and doesn't justify the added complexity and race risk of caching.

Both NodeKey1 and NodeKey2 methods had the same race condition as the Node.PubKey method, where concurrent calls could race to write to the cached fields. Remove the caching for the same reasons: parsing overhead is minimal and doesn't justify the complexity and race risk.

The DisconnectBlockAtHeight method was modifying the rejectCache and chanCache without holding the cacheMu lock. This caused races with other operations that properly held the lock, such as AddChannelEdge which modifies the caches in its OnCommit callback while the batch scheduler holds cacheMu. Fix by acquiring cacheMu before removing channels from the caches.

gemini-code-assist · 2025-12-04T09:48:05Z

Summary of Changes

Hello @ellemouton, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses and resolves several race conditions identified within the channel graph database. The primary focus is on ensuring data consistency by removing problematic caching patterns in public key parsing functions and implementing robust mutex-based synchronization for critical cache modification operations. These changes enhance the stability and reliability of the graph database under concurrent access.

Highlights

Race Condition Fixes in Public Key Parsing: Eliminated check-then-act race conditions in Node.PubKey() and ChannelEdgeInfo.NodeKey1/NodeKey2() methods by removing the caching of parsed public keys. These methods now parse the public key directly from its byte representation each time they are called.
Mutex Protection for Cache Operations: Introduced proper mutex locking (s.cacheMu) to DisconnectBlockAtHeight and updateEdgeCache methods in the SQL store. This ensures thread-safe access to the rejectCache and chanCache, preventing concurrency issues when modifying these shared resources.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request fixes several race conditions in the channel graph database. The main changes are removing racy in-memory caching of public keys in Node and ChannelEdgeInfo and instead parsing them on each access, and adding mutex protection for concurrent access to rejectCache and chanCache in updateEdgeCache and DisconnectBlockAtHeight. The changes correctly address the identified race conditions. However, I've found a critical issue where the lock added in updateEdgeCache may cause a deadlock due to the locking strategy of the batch package. I've also added a medium-severity suggestion for improving code style and safety in DisconnectBlockAtHeight and a note on a placeholder in the release notes. Overall, the direction is good but the potential deadlock needs to be addressed.

graph/db/sql_store.go

docs/release-notes/release-notes-0.21.0.md

gemini-code-assist · 2025-12-04T09:51:20Z

graph/db/sql_store.go

+	s.cacheMu.Lock()
 	for _, channel := range removedChans {
 		s.rejectCache.remove(channel.ChannelID)
 		s.chanCache.remove(channel.ChannelID)
 	}
+	s.cacheMu.Unlock()


While this is correct, using defer s.cacheMu.Unlock() right after the Lock() call is more idiomatic in Go and safer against future modifications that might add an early return path within the locked section. This would also make it consistent with the change in updateEdgeCache.

bitromortac

I think the changes look good, pending a pathfinding benchmark to see if the caching is relevant.

bitromortac · 2025-12-04T11:25:31Z

graph/db/models/channel_edge_info.go


 	// NodeKey1Bytes is the raw public key of the first node.
 	NodeKey1Bytes [33]byte
-	nodeKey1      *btcec.PublicKey


caching came from #706, I think it would be good to do a pathfinding benchmark and/or profile

cool yeah the other option is just to only store the *btcec.PublicKey and not the bytes here

other option is just to leave the caching but use a sync.Once or something

I didn't see any pathfinding or graph loading performance degradation. I'm not sure about allocations, but haven't spotted anything suspicious and the call sites seem not to be relevant for pathfinding at least.

I also checked allocations and couldn't see a difference

bitromortac

LGTM 🎉

ellemouton added 3 commits December 4, 2025 11:23

ellemouton force-pushed the removePubKeyCaching branch from 49ccdbe to 33b1727 Compare December 4, 2025 09:49

gemini-code-assist bot reviewed Dec 4, 2025

View reviewed changes

docs: add release notes for race condition fixes

1acf309

ellemouton force-pushed the removePubKeyCaching branch from 33b1727 to 1acf309 Compare December 4, 2025 10:53

yyforyongyu added this to the v0.20.1 milestone Dec 4, 2025

bitromortac reviewed Dec 4, 2025

View reviewed changes

saubyk assigned ellemouton Dec 4, 2025

saubyk added this to lnd v0.20 Dec 4, 2025

saubyk moved this to In review in lnd v0.20 Dec 4, 2025

saubyk requested a review from yyforyongyu December 4, 2025 15:21

ellemouton requested a review from bitromortac December 5, 2025 06:05

bitromortac approved these changes Dec 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

graph: fix various races #10420

graph: fix various races #10420

ellemouton commented Dec 4, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Dec 4, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Dec 4, 2025

Uh oh!

bitromortac left a comment

Uh oh!

bitromortac Dec 4, 2025

Uh oh!

ellemouton Dec 4, 2025

Uh oh!

ellemouton Dec 4, 2025

Uh oh!

bitromortac Dec 4, 2025

Uh oh!

bitromortac Dec 5, 2025

Uh oh!

bitromortac left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

graph: fix various races #10420

Are you sure you want to change the base?

graph: fix various races #10420

Conversation

ellemouton commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Dec 4, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

bitromortac left a comment

Choose a reason for hiding this comment

Uh oh!

bitromortac Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

ellemouton Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

ellemouton Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

bitromortac Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

bitromortac Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

bitromortac left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ellemouton commented Dec 4, 2025 •

edited

Loading