lnwallet/channeldb: restore fee updates after restart#10622
lnwallet/channeldb: restore fee updates after restart#10622ziggie1984 wants to merge 1 commit intolightningnetwork:masterfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a subtle but critical issue in channel state management where fee updates were not properly restored from disk after a node restart. This oversight could lead to commitment signature verification failures during channel reestablishment, resulting in unnecessary channel force closures. The changes ensure that all pending fee updates are correctly reloaded, maintaining channel consistency and preventing erroneous state discrepancies. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request addresses a critical bug where fee updates were not correctly restored after a restart, potentially leading to incorrect force closures. The refactoring in AdvanceCommitChainTail to prevent an early return is a clean and effective fix, ensuring that all pending local updates are persisted correctly. The addition of TestChanSyncUpdateFeeRestartAfterRevoke is excellent, as it thoroughly covers the specific failure scenario and confirms the fix. The changes are well-implemented and improve the overall robustness of the channel state management.
Note: Security Review is unavailable for this PR.
Fix a bug where UpdateFee entries persisted under remoteUnsignedLocalUpdatesKey were silently skipped during restoreStateLogs, leaving the update log empty after a restart. This caused a retransmitted commit_sig (covering the new fee rate) to fail verification after channel reestablishment, incorrectly triggering a force close. Also refactor AdvanceCommitChainTail to avoid an early return when unsignedAckedUpdates is nil, so that the full validation path always runs. Adds TestChanSyncUpdateFeeRestartAfterRevoke (lnwallet) and TestChannelLinkFailDuringRestart (htlcswitch) to reproduce the scenario at both the channel and link layers.
952f500 to
39b8699
Compare
Bortlesboat
left a comment
There was a problem hiding this comment.
ACK 39b8699
Good catch — I've seen this exact pattern cause spurious force closes on my node after fee spikes. The channeldb fix is clean. Left a couple of notes on the link test.
| time.Sleep(5 * time.Second) | ||
|
|
||
| case <-time.After(5 * time.Second): | ||
| t.Fatalf("did not receive channel_reestablish message") |
There was a problem hiding this comment.
The time.Sleep(5 * time.Second) here to wait for the force close processing feels brittle — if the test runner is slow this might not be enough, and on fast machines it just wastes time. Could this use a channel signal or poll on a condition instead? Something like piping through the OnChannelFailure callback into a linkErrors chan (like TestChannelLinkUpdateCommitFee does) and selecting on that with a timeout would be more deterministic.
| // TestChannelLinkFailDuringRestart tests that the link fails (force-closing | ||
| // the channel) if the commit dance is not completed after sharing the | ||
| // update_fee and one of the peers restarts before completion. | ||
| // |
There was a problem hiding this comment.
Nit: the test doc comment says "tests that the link fails (force-closing the channel)" but after the fix the assertion is that force close does NOT happen. Might want to update the comment to reflect the corrected behavior, something like "tests that the link does not force-close after a restart when a fee update was pending".
| lnwire.ShortChannelID, LinkFailureError) { | ||
| OnChannelFailure: func(_ lnwire.ChannelID, | ||
| _ lnwire.ShortChannelID, linkErr LinkFailureError) { | ||
|
|
There was a problem hiding this comment.
This changes shared test infrastructure — restartLink is used by ~15 other tests. Adding require.NotEqual(t, linkErr.FailureAction, LinkFailureForceClose) here means any future test that legitimately expects a force close through this path would silently break. Safer to keep the no-op default and override OnChannelFailure in TestChannelLinkFailDuringRestart specifically, like TestChannelLinkUpdateCommitFee does.
|
@ziggie1984, are there any pending items keeping this in draft? |
There was a problem hiding this comment.
I think we are now adding an empty record to the DB:
If updateBytes == nil ==> unsignedUpdates is empty ==> validUpdates is empty and because of that we are putting an empty record with chanBucket.Put(unsignedAckedUpdatesKey, b.Bytes())
So perhaps we can simply do:
diff --git a/channeldb/channel.go b/channeldb/channel.go
index e6b2bc483..6dcc17990 100644
--- a/channeldb/channel.go
+++ b/channeldb/channel.go
@@ -3420,6 +3420,20 @@ func (c *OpenChannel) AdvanceCommitChainTail(fwdPkg *FwdPkg,
return err
}
+ // Persist the local updates the peer hasn't yet signed so they
+ // can be restored after restart.
+ var b2 bytes.Buffer
+ err = serializeLogUpdates(&b2, updates)
+ if err != nil {
+ return err
+ }
+
+ err = chanBucket.Put(remoteUnsignedLocalUpdatesKey, b2.Bytes())
+ if err != nil {
+ return fmt.Errorf("unable to restore remote unsigned "+
+ "local updates: %v", err)
+ }
+
// Persist the unsigned acked updates that are not included
// in their new commitment.
updateBytes := chanBucket.Get(unsignedAckedUpdatesKey)
@@ -3461,20 +3475,6 @@ func (c *OpenChannel) AdvanceCommitChainTail(fwdPkg *FwdPkg,
"unsignedAckedUpdatesKey: %w", err)
}
- // Persist the local updates the peer hasn't yet signed so they
- // can be restored after restart.
- var b2 bytes.Buffer
- err = serializeLogUpdates(&b2, updates)
- if err != nil {
- return err
- }
-
- err = chanBucket.Put(remoteUnsignedLocalUpdatesKey, b2.Bytes())
- if err != nil {
- return fmt.Errorf("unable to restore remote unsigned "+
- "local updates: %v", err)
- }
-
newRemoteCommit = &newCommit.Commitment
return nilInitially flagged in: NishantBansal2003#7 (comment)
|
I put up a helper PR with the review cleanups here: ziggie1984#23 It covers the three unresolved If it's easier, the patch is also just commit |
|
@ziggie1984, remember to re-request review from reviewers when ready |
|
@Bortlesboat maybe should ask @ziggie1984 first, whether
is the same goal or two different goals? if it's the same goal, then changes in ziggie1984#23 is enough . but, if two different goals, we need guard it with var validUpdates []LogUpdate
for _, upd := range unsignedUpdates {
lIdx := upd.LogIndex
// Filter for updates that are not on the remote
// commitment.
if lIdx >= newCommit.Commitment.RemoteLogIndex {
validUpdates = append(validUpdates, upd)
}
}
if updateBytes != nil {
var b bytes.Buffer
err = serializeLogUpdates(&b, validUpdates)
if err != nil {
return fmt.Errorf("unable to serialize log updates: %w", err)
}
}
``` |
|
@ziggie1984 to make sure if accidentally double htlc update_fee happens, the chosen commitment tx fee per kw is the last htlc update_fee, maybe you can also add this as another unit test func TestChanSyncUpdateFeeRestartAfterRevoke_WithAnotherFee(t *testing.T) {
t.Parallel()
aliceChannel, bobChannel, err := CreateTestChannels(
t, channeldb.SingleFunderTweaklessBit,
)
require.NoError(t, err, "unable to create test channels")
// Alice is the channel initiator and sends an update_fee to Bob.
newFee := chainfee.SatPerKWeight(12500)
err = aliceChannel.UpdateFee(newFee)
require.NoError(t, err, "alice unable to update fee")
anotherNewFee := chainfee.SatPerKWeight(25000)
err = aliceChannel.UpdateFee(anotherNewFee)
require.NoError(t, err, "alice unable to update fee")
err = bobChannel.ReceiveUpdateFee(newFee)
require.NoError(t, err, "bob unable to receive fee update")
err = bobChannel.ReceiveUpdateFee(anotherNewFee)
require.NoError(t, err, "bob unable to receive fee update")
// Alice signs the next commitment, which includes the fee update.
aliceNewCommit, err := aliceChannel.SignNextCommitment(ctxb)
require.NoError(t, err, "alice unable to sign commitment")
// Bob receives and processes Alice's commitment.
err = bobChannel.ReceiveNewCommitment(aliceNewCommit.CommitSigs)
require.NoError(t, err, "bob unable to process alice's new commitment")
// Bob revokes his old commitment (producing a revoke_and_ack)...
bobRevocation, _, _, err := bobChannel.RevokeCurrentCommitment()
require.NoError(t, err, "bob unable to revoke commitment")
// ...and signs a new commitment for Alice.
_, err = bobChannel.SignNextCommitment(ctxb)
require.NoError(t, err, "bob unable to sign alice's commitment")
// Alice receives Bob's revoke_and_ack. This advances Bob's remote tail
// in Alice's view. During ReceiveRevocation, the still-unacknowledged
// fee update is written to disk under remoteUnsignedLocalUpdatesKey.
//
// The connection then drops before Alice receives Bob's commit_sig.
_, _, err = aliceChannel.ReceiveRevocation(bobRevocation)
require.NoError(t, err, "alice unable to process bob's revocation")
// Alice restarts. Due to the bug, the fee update stored under
// remoteUnsignedLocalUpdatesKey is not restored into Alice's update
// log (UpdateFee is skipped by the default: continue in restoreStateLogs).
aliceChannel, err = restartChannel(aliceChannel)
require.NoError(t, err, "unable to restart alice channel")
// Both sides produce their ChannelReestablish messages.
aliceSyncMsg, err := aliceChannel.channelState.ChanSyncMsg()
require.NoError(t, err, "alice unable to produce chan sync msg")
bobSyncMsg, err := bobChannel.channelState.ChanSyncMsg()
require.NoError(t, err, "bob unable to produce chan sync msg")
// Alice processes Bob's ChannelReestablish. Bob's NextLocalCommitHeight
// is 2, and Alice's remoteTipHeight is 1, so 2 == 1+1: Alice concludes
// the remote side is in sync and has nothing to retransmit.
aliceMsgsToSend, _, _, err := aliceChannel.ProcessChanSyncMsg(
ctxb, bobSyncMsg,
)
require.NoError(t, err, "alice unable to process bob's chan sync msg")
require.Empty(t, aliceMsgsToSend,
"alice should have no messages to retransmit after reestablish",
)
// Bob processes Alice's ChannelReestablish. Alice's NextLocalCommitHeight
// is 1, which equals Bob's remoteTipHeight (1), so Bob detects he owes
// Alice a commitment signature and prepares to retransmit it.
bobMsgsToSend, _, _, err := bobChannel.ProcessChanSyncMsg(
ctxb, aliceSyncMsg,
)
require.NoError(t, err, "bob unable to process alice's chan sync msg")
require.Len(t, bobMsgsToSend, 1,
"bob should retransmit exactly one message (his commit_sig)",
)
retransmittedSig, ok := bobMsgsToSend[0].(*lnwire.CommitSig)
require.True(t, ok, "expected bob to retransmit a CommitSig")
// Alice receives Bob's retransmitted commit_sig. This should succeed:
// Alice must have the fee update in her update log to compute the
// correct commitment and verify Bob's signature.
//
// Currently this FAILS because the fee update is not restored after
// restart, so Alice uses the old fee rate and Bob's signature (covering
// the new fee rate) does not verify. In the link layer this causes
// LinkFailureForceClose — an incorrect force close of a healthy channel.
err = aliceChannel.ReceiveNewCommitment(&CommitSigs{
CommitSig: retransmittedSig.CommitSig,
HtlcSigs: retransmittedSig.HtlcSigs,
})
require.NoError(t, err,
"alice should process bob's retransmitted commit_sig without "+
"error; fee update not restored after restart causes "+
"spurious force close",
)
// Complete the commit dance and verify the fee is locked in on both sides.
aliceRevocation, _, _, err := aliceChannel.RevokeCurrentCommitment()
require.NoError(t, err, "alice unable to revoke commitment")
_, _, err = bobChannel.ReceiveRevocation(aliceRevocation)
require.NoError(t, err, "bob unable to process alice's revocation")
require.Equal(t, anotherNewFee,
chainfee.SatPerKWeight(
aliceChannel.channelState.LocalCommitment.FeePerKw,
),
"alice's fee rate was not properly locked in",
)
require.Equal(t, anotherNewFee,
chainfee.SatPerKWeight(
bobChannel.channelState.LocalCommitment.FeePerKw,
),
"bob's fee rate was not properly locked in",
)
} |
Summary
UpdateFeeentries persisted underremoteUnsignedLocalUpdatesKeywere silently skipped duringrestoreStateLogs, leaving the update log empty after a restartcommit_sig(covering the new fee rate) to fail verification after channel reestablishment, incorrectly triggering a force closeAdvanceCommitChainTailto avoid an early return whenunsignedAckedUpdatesis nil, so the full validation path always runsTestChanSyncUpdateFeeRestartAfterRevoketo reproduce the scenarioRoot Cause
During
ReceiveRevocation, the fee update is persisted underremoteUnsignedLocalUpdatesKey. However,restoreStateLogsonly handles HTLC-type updates (UpdateFulfillHTLC,UpdateFailHTLC, etc.) and silently skipsUpdateFeeviadefault: continue. After restart the update log is empty, the expected commitment is computed with the old fee rate, and the signature verification fails.Test Plan
TestChanSyncUpdateFeeRestartAfterRevokepasses