sweep: fix expected spending events being missed #10060

yyforyongyu · 2025-07-09T12:34:28Z

Fix the issue #10051. What happened there was,

a sweeping tx was created during a restart, which put CommitmentAnchor and CommitmentTimeLock in the same group, while the anchor has already been spent.
This is usually fine as we would detect that the anchor input is spent, and retry sweeping the to_local output.

However, from the logs there, the spending event was not notified quickly enough here, causing us to think there's no spent of the anchor input,

lnd/sweep/fee_bumper.go

Lines 1433 to 1453 in ea32aac

    
           	// Do a non-blocking read to see if the output has been spent. 
        
           	select { 
        
           	case spend, ok := <-spendEvent.Spend: 
        
           		if !ok { 
        
           			log.Debugf("Spend ntfn for %v canceled", op) 
        
           			continue 
        
           		} 
        
           		spendingTx := spend.SpendingTx 
        
           		log.Debugf("Detected spent of input=%v in tx=%v", op, 
        
           			spendingTx.TxHash()) 
        
           		spentInputs[op] = spendingTx 
        
           	// Move to the next input. 
        
           	default: 
        
           		log.Tracef("Input %v not spent yet", op) 
        
           	} 
        
           }

We now fix it by calling GetUtxo first to check whether a given input is spent or not, and if it is, we will then do a block reading on the spending notification to receive a spending event.

gemini-code-assist · 2025-07-09T12:36:14Z

Warning

Gemini encountered an error creating the summary. You can try again by commenting /gemini summary.

yyforyongyu · 2025-07-09T12:44:28Z

/gemini summary

gemini-code-assist · 2025-07-09T12:44:55Z

Summary of Changes

This pull request aims to resolve a critical issue in the sweep module where expected spending events for transaction inputs could be missed, particularly after a restart. The core change involves enhancing the input spending detection mechanism by first querying the blockchain for UTXO status. If an input is found to be already spent, the system now expects an immediate notification, improving the reliability of the sweeping process and preventing erroneous retries.

Highlights

Spending Event Reliability: Introduced a proactive check using GetUtxo to determine if a sweep input is already spent before registering for a spend notification. This addresses a race condition where spending events might be missed during restarts, leading to incorrect assumptions about input status.
Immediate Notification Expectation: Modified the getSpentInputs logic to expect an immediate spending event notification if GetUtxo indicates an input is already spent. A timeout has been added to log an error if this expectation is not met, highlighting potential issues with the notification system.
Dependency Injection: Added ChainIO (an abstraction for blockchain queries) to TxPublisherConfig, allowing the fee_bumper to directly query UTXO status.
Test Coverage & Fixes: Updated various test cases in fee_bumper_test.go to mock the new GetUtxo behavior, ensuring the logic for handling spent and unspent inputs is correctly tested. Several minor typos in comments and test function names were also corrected.

Changelog

sweep/fee_bumper.go
- Added time package import and defined spentNotificationTimeout constant.
- Corrected typo from sentinalEvent to sentinelEvent.
- Introduced ChainIO lnwallet.BlockChainIO to TxPublisherConfig for blockchain query capabilities.
- Implemented a GetUtxo check for each input in getSpentInputs to determine if it's already spent before registering a spend notification.
- Modified the select statement for spend events to perform a blocking read with a timeout, expecting immediate notification if GetUtxo indicates the input is spent.
- Corrected various minor typos in comments (e.g., differetly to differently, suing to using).
sweep/fee_bumper_test.go
- Imported lnmock package for mocking chain operations.
- Updated mockers struct and createTestPublisher to include and initialize MockChain for ChainIO.
- Corrected typo from sentinalEvent to sentinelEvent in TestBumpResultValidate.
- Renamed test functions TestCreateAnPublishFail to TestCreateAndPublishFail and TestCreateAnPublishSuccess to TestCreateAndPublishSuccess for consistency.
- Adjusted TestProcessRecordsInitial and TestProcessRecordsFeeBump to mock GetUtxo returning an unspent UTXO, skipping spend notification registration.
- Adjusted TestProcessRecordsInitialSpent, TestProcessRecordsConfirmed, and TestProcessRecordsSpent to mock GetUtxo returning a spent UTXO, proceeding with spend notification registration.
- Updated TestHasInputsSpent to include GetUtxo mocks for various input states and changed SignDesc mock expectation from Once() to Twice() for one input due to the new GetUtxo call.
- Corrected various minor typos in comments (e.g., alaways to always).

Activity

The Gemini bot encountered an error while attempting to create a summary (Comment 1).
The author, @yyforyongyu, re-triggered the Gemini bot for a summary (Comment 2).
There has been no human review activity or comments on this pull request yet.

yyforyongyu · 2025-07-09T12:49:21Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a fix to prevent missing expected spending events by checking UTXOs before registering spend notifications. The changes include adding a GetUtxo call, handling its results, and updating tests. I've provided feedback on potential panic scenarios and logging improvements.

sweep/fee_bumper.go

sweep/fee_bumper_test.go

sweep/fee_bumper.go

yyforyongyu · 2025-07-09T18:40:10Z

/gemini review

gemini-code-assist

Code Review

This pull request fixes a race condition where a spent input could be missed by the sweeper. The use of GetUtxo to verify the spent status is a good solution, and the test updates are comprehensive. The timeout handling in getSpentInputs can be improved for better correctness and resilience.

sweep/fee_bumper.go

yyforyongyu · 2025-07-09T18:44:50Z

/gemini review

gemini-code-assist

Code Review

This pull request addresses a bug where spending events for swept inputs could be missed, by first checking the UTXO status with GetUtxo. The logic is sound and the implementation is robust, including a timeout for spend notifications. The test suite has been updated accordingly. My main feedback is to adjust the log severity for certain error conditions from Critical to Warn to better reflect their nature and avoid unnecessary operational alerts.

sweep/fee_bumper.go

We now first check whether a given input has been found in the confirmed utxo set or not. When it's found, we can skip waiting for its spending notification.

This commit now makes the reading of spent events blocking. This is needed to make sure we won't miss a spent event for a spent input. Given when an input is spent, a spent event is returned immediately, this reading actually doesn't block, as by this point, we know for sure the input has been spent via `GetUtxo` check.

Abdulkbk

Nice, had an initial pass and left some questions

Abdulkbk · 2025-07-11T16:39:35Z

sweep/fee_bumper.go

@@ -1415,6 +1420,38 @@ func (t *TxPublisher) getSpentInputs(
 				"%v", op, heightHint)
 		}

+		// Check whether the input has been spent or not.


So, the GetUtxo call is probably just added here to save us time? because I noticed RegisterSpendNtfn also checks this internally.

yeah correct, it creates a shortcut here so we don't need to make unnecessary subscriptions. We only attempt to subscribe for spending when we know it's not in the utxo set, which means either the input has been spent or it's an orphan.

Abdulkbk · 2025-07-11T17:41:36Z

sweep/fee_bumper.go

@@ -1424,7 +1461,7 @@ func (t *TxPublisher) getSpentInputs(
 			log.Criticalf("Failed to register spend ntfn for "+
 				"input=%v: %v", op, err)

-			return nil
+			return spentInputs


So initially we return nil, and looking at the 2 instances this method is used, there is a check for the length of what was returned if len(spends) == 0 {. That would have caused LND to panic, right?.

A follow-up question is: what happens when we have multiple inputs (I guess that's a possibility), and one fails? Does that affect where we call the method since no error will be returned, and the only check I see is for the length of the returned result?

Returning nil here actually returns an empty map, so the nil is actually a zero-value map, thus calling len won't panic.

what happens when we have multiple inputs (I guess that's a possibility), and one fails?

What do you mean one fails? If there's a failure here, then we'd shut down lnd due to Criticalf.

What do you mean one fails? If there's a failure here, then we'd shut down lnd due to Criticalf.

Ah, I now understand that Criticalf sends a shutdown request after logging the error.

bitromortac

Looks good on initial pass 🙏

bitromortac · 2025-07-22T07:44:44Z

sweep/fee_bumper_test.go

+	m.chainIO.On("GetUtxo",
+		&op, inp.SignDesc().Output.PkScript, inp.HeightHint(),
+		mock.Anything,
+	).Return(&wire.TxOut{}, nil).Once()

 	// Create a monitor record that's not confirmed. We know it's not
 	// confirmed because the `SpendEvent` is empty.


is this comment now misleading?

bitromortac · 2025-07-22T07:51:38Z

sweep/fee_bumper_test.go

+	m.chainIO.On("GetUtxo",
+		&op, inp.SignDesc().Output.PkScript, inp.HeightHint(),
+		mock.Anything,
+	).Return(nil, nil).Once()


would it be useful to also return btcwallet.ErrOutputSpent for more realistic testing?

bitromortac · 2025-07-23T12:51:05Z

sweep/fee_bumper.go

+		// is spent or not. A better approach is to implement a new
+		// synchronous method to check for spending, which should be
+		// attempted when implementing SQL into btcwallet.
+		case <-time.After(spentNotificationTimeout):


I think the assumption here is not quite right, a spend event from RegisterSpendNtfn may arrive only very much later, since it may be doing a historical rescan for the output (and that is done from the current height back to the height hint, which can take a long time if the node was offline for some time and a force close happened in between). The same holds for the call in monitorSpend, not sure if that is problematic for the sweeper if there's a long delay between publish and spend notification.

Why do we need the spending transactions here, it looks like this is only used for logging/sanity checks, right? The docstring on r.spentInputs seems to also be misleading because all the spends may have been from the sweep transaction, I think.

Roasbeef · 2025-07-25T21:06:02Z

sweep/fee_bumper.go

@@ -1415,6 +1420,38 @@ func (t *TxPublisher) getSpentInputs(
 				"%v", op, heightHint)
 		}

+		// Check whether the input has been spent or not.
+		utxo, err := t.cfg.ChainIO.GetUtxo(


Hmm, will this also populate the spend cache for neutrino backends? Otherwise, this can be a very expensive filter rescan depending on how far back they are.

In other words, this'll block for neutrno backends. Would need to check for behavior with backends that have the txindex off.

What about just moving back to the spend channel/goroutine? That way it's always active, always watching, and we can handle the notification async when needed.

It would allow us to remove all these other default select cases for spend ntfns. I recall I pointed out a possibility of missed events when this change was originally added.

Hmm, will this also populate the spend cache for neutrino backends? Otherwise, this can be a very expensive filter rescan depending on how far back they are.

FWIW this is already used in RegisterSpendNtfn implemented in neutrino,

lnd/chainntnfs/neutrinonotify/neutrino.go

Line 864 in 4389067

spendReport, err := n.p2pNode.GetUtxo(

What about just moving back to the spend channel/goroutine? That way it's always active, always watching, and we can handle the notification async when needed.

Can try that route, meanwhile there's #10117 that fixes this issue using an alternative approach. I will see if it's possible to make a new sync method when implementing SQL into btcwallet.

morehouse · 2025-07-29T20:23:58Z

Oof. So basically this comment is wrong:

lnd/sweep/fee_bumper.go

Lines 1418 to 1422 in ea32aac

    
           // If the input has already been spent after the height hint, a 
        
           // spend event is sent back immediately. 
        
           spendEvent, err := t.cfg.Notifier.RegisterSpendNtfn( 
        
           	&op, inp.SignDesc().Output.PkScript, heightHint, 
        
           )

And the sending of the spend event is actually racy. This probably has broader implications than just this one piece of code -- IIRC this pattern is used in other places too.

Can we change RegisterSpendNtfn to have the desired behavior? For the sweeper, we really need a way to query for spent inputs synchronously.

yyforyongyu · 2025-07-31T06:57:30Z

And the sending of the spend event is actually racy. This probably has broader implications than just this one piece of code -- IIRC this pattern is used in other places too.

Yeah it's also manifested in the itest, for instance here,

lnd/itest/flakes.go

Line 41 in 23dd01c

func flakeTxNotifierNeutrino(ht *lntest.HarnessTest) {

and here,

lnd/itest/flakes.go

Line 137 in 23dd01c

func flakeRaceInBitcoinClientNotifications(ht *lntest.HarnessTest) {

Basically the block event and spend event are async. Previously there was an attempt to make them sync in blockbeat, the idea is, when a block height is received, we can directly fetch more info about the block such as inputs spent, hence making the whole flow linear. Yet there were some challenges when implementing it for neutrino, since that would mean we need to fetch every block. I think we can dig deeper to see how to make it work. Meanwhile as I'm working on SQLizing btcwallet, I will also see if there's an efficient way to implement a synchronous method that fetches the spending txns.

Will put this PR in draft now, as #10117 should fix this issue.

Roasbeef · 2025-08-05T01:18:02Z

IMO we should just go back to the dedicated spend detection goroutine, with a goroutine per input that sends the spend event into the main channel: #10060 (comment).

It is true that the recv there will be instant, and not fall through to the default, but only if the channel has already been sent on before we enter that case.

Going back to dedicated goroutines to make sure all the spends are acted upon layers on the least amount of assumptions.

Roasbeef · 2025-08-05T01:20:54Z

I took a look at #10117, it doesn't appear to resolve this overarching issue of potentially missed spends with a default select case.

yyforyongyu · 2025-08-05T07:26:06Z

I took a look at #10117, it doesn't appear to resolve this overarching issue of potentially missed spends with a default select case.

This case is primarily built for detecting 3rd party anchor spend when it's grouped with other inputs, given that anchor is not grouped, we should not hit this case here.

IMO we should just go back to the dedicated spend detection goroutine

The issue is that it doesn't fit the current TxPublisher, so a refactor is needed to make it happen. I think instead of making any kind of assumptions, we can just extend blockbeat to return the block info, or provide a callback to fetch block info, given that we are already receiving the block height for every block, I find it redundant to subscribe then wait for a spending event.

yyforyongyu added this to the v0.20.0 milestone Jul 9, 2025

yyforyongyu self-assigned this Jul 9, 2025

yyforyongyu added the bug fix label Jul 9, 2025

yyforyongyu added this to lnd v0.20 Jul 9, 2025

gemini-code-assist bot reviewed Jul 9, 2025

View reviewed changes

sweep/fee_bumper.go Outdated Show resolved Hide resolved

sweep/fee_bumper_test.go Show resolved Hide resolved

sweep/fee_bumper.go Show resolved Hide resolved

sweep/fee_bumper.go Outdated Show resolved Hide resolved

saubyk moved this to In progress in lnd v0.20 Jul 9, 2025

saubyk added the utxo sweeping label Jul 9, 2025

yyforyongyu force-pushed the fix-sweep branch from 2f4e850 to 46496a2 Compare July 9, 2025 18:38

gemini-code-assist bot reviewed Jul 9, 2025

View reviewed changes

sweep/fee_bumper.go Outdated Show resolved Hide resolved

yyforyongyu force-pushed the fix-sweep branch from 46496a2 to 9120d9c Compare July 9, 2025 18:44

gemini-code-assist bot reviewed Jul 9, 2025

View reviewed changes

sweep/fee_bumper.go Outdated Show resolved Hide resolved

sweep/fee_bumper.go Show resolved Hide resolved

yyforyongyu force-pushed the fix-sweep branch 3 times, most recently from d15e867 to f652051 Compare July 10, 2025 00:49

yyforyongyu added 5 commits July 11, 2025 10:11

sweep: check utxo set before registering for spending notification

940d317

We now first check whether a given input has been found in the confirmed utxo set or not. When it's found, we can skip waiting for its spending notification.

sweep: fix unit tests

96a6857

sweep: fix typos

5502d91

sweep: add timeout when waiting for spending event

ea6c132

yyforyongyu force-pushed the fix-sweep branch from f652051 to ea6c132 Compare July 11, 2025 07:11

Abdulkbk reviewed Jul 11, 2025

View reviewed changes

yyforyongyu mentioned this pull request Jul 21, 2025

[bug]: pendingsweeps empty after #10053

Closed

bitromortac reviewed Jul 22, 2025

View reviewed changes

saubyk linked an issue Jul 22, 2025 that may be closed by this pull request

[bug]: pendingsweeps empty after #10053

Closed

saubyk removed this from lnd v0.20 Jul 22, 2025

saubyk requested a review from Roasbeef July 22, 2025 16:55

bitromortac reviewed Jul 23, 2025

View reviewed changes

Roasbeef requested changes Jul 25, 2025

View reviewed changes

saubyk modified the milestones: v0.20.0, v0.19.3 Jul 29, 2025

ziggie1984 added the P0 very high priority issue/PR, blocker on all others label Jul 30, 2025

yyforyongyu mentioned this pull request Jul 31, 2025

contractcourt+sweep: make anchor inputs exclusive #10117

Merged

yyforyongyu marked this pull request as draft July 31, 2025 06:58

saubyk modified the milestones: v0.19.3, v0.20.0 Jul 31, 2025

saubyk added this to lnd v0.20 Jul 31, 2025

saubyk moved this to Backlog in lnd v0.20 Jul 31, 2025

saubyk removed a link to an issue Aug 5, 2025

[bug]: pendingsweeps empty after #10053

Closed

	// Do a non-blocking read to see if the output has been spent.
	select {
	case spend, ok := <-spendEvent.Spend:
	if !ok {
	log.Debugf("Spend ntfn for %v canceled", op)

	continue
	}

	spendingTx := spend.SpendingTx

	log.Debugf("Detected spent of input=%v in tx=%v", op,
	spendingTx.TxHash())

	spentInputs[op] = spendingTx

	// Move to the next input.
	default:
	log.Tracef("Input %v not spent yet", op)
	}
	}

sweep: fix expected spending events being missed #10060

Are you sure you want to change the base?

sweep: fix expected spending events being missed #10060

Uh oh!

Conversation

yyforyongyu commented Jul 9, 2025

Uh oh!

gemini-code-assist bot commented Jul 9, 2025

Uh oh!

yyforyongyu commented Jul 9, 2025

Uh oh!

gemini-code-assist bot commented Jul 9, 2025

Summary of Changes

Highlights

Uh oh!

yyforyongyu commented Jul 9, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yyforyongyu commented Jul 9, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

yyforyongyu commented Jul 9, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Abdulkbk left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bitromortac left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

morehouse commented Jul 29, 2025

Uh oh!

yyforyongyu commented Jul 31, 2025

Uh oh!

Roasbeef commented Aug 5, 2025

Uh oh!

Roasbeef commented Aug 5, 2025

Uh oh!

yyforyongyu commented Aug 5, 2025

Uh oh!

Uh oh!