Skip to content

Conversation

Roasbeef
Copy link
Member

@Roasbeef Roasbeef commented Oct 2, 2025

In this PR, we address the oldest issue in the lnd track: #53. Incremental improvements have landed over the years, but this is the most significant one to date.

First, we start to scale confs for closes just like we do for funding confirmation. A small refactor helps us to re-use this, with additional tests added.

We revamp the chain watcher to implement a state machine to ensure that we recognize the most deeply confirmed transaction closure. Once we detect a spend, we'll register for N confirmations for that spend. If we detect anther one (we can get another spend if a re-org happens, and a new one confirms), then we'll re-register for confirmations for that spend. We also start to read from the NegativeConf channel which will be sent upon if the transaction is re-org'd out after confirmation.

It's important to note that the logic around the NegativeConf case is incomplete, as this will only trigger if: we wait for 6 confs, then we have a 7 block re-org. In that case, we've already claimed all of our outputs on chain typically. To handle this, we'll actually need to re-create the channel arb and the chain watcher, the re-handle any contested outputs we may have had. If the re-org goes all the way back to the funding transaction, then that's extremely dire, and needs special handling. This isn't covered in this PR.

Most of the diff is actually just tests: unit tests, property based tests (helps us ensure we can handle spend, conf, re-org, spend, conf of nearly arbitrary params -- eg: confirm coop, re-org, confirm force close, etc), integration tests, etc. The core change is in this commit: b1d055e.

This fixes most of #53. Re the second item, if the funding txn gets re-org out, then confirmed again, then we'll be able to detect that, and close it normally. We still need the active logic to detect such a case though.

@Roasbeef Roasbeef added channel closing Related to the closing of channels cooperatively and uncooperatively chain handling labels Oct 2, 2025
Copy link

Summary of Changes

Hello @Roasbeef, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a major overhaul to the chainWatcher component, focusing on making channel closure logic more resilient to blockchain reorganizations. By implementing a sophisticated state machine, the system can now dynamically adapt to re-orgs, re-registering for spend and confirmation notifications as needed to ensure the correct closing transaction is ultimately processed. Additionally, the number of confirmations required for channel closes now scales with channel capacity, providing a more balanced approach to security and user experience. This foundational work significantly improves the reliability of on-chain channel management.

Highlights

  • Robust Re-org Handling for Channel Closes: The chainWatcher has been significantly revamped to incorporate a state machine that robustly handles blockchain reorganizations during channel closure. This ensures that the system can accurately track and react to changes in the chain, such as a closing transaction being re-orged out and replaced by another.
  • Scaled Confirmation Requirements for Closes: Confirmation requirements for channel closes now scale based on the channel's capacity, similar to how funding transactions are handled. This balances security for larger channels with faster finality for smaller ones. A minimum of 3 confirmations is enforced for cooperative closes in production for re-org safety.
  • Improved User Feedback for Cooperative Closes: For cooperative closes, the system now provides an intermediate update after just one confirmation, giving users quicker feedback that their transaction has been included in a block, followed by a final update once the full, scaled number of confirmations is reached.
  • Extensive Testing with Property-Based and Integration Tests: A substantial portion of this pull request is dedicated to new and enhanced tests, including property-based tests using rapid and new integration tests. These cover various re-org scenarios and different channel close types (cooperative, unilateral, breach) to ensure the new logic is sound and resilient.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant improvements to the channel closing logic, making it more robust against chain reorganizations by implementing a state machine in the chainWatcher. It also adds a more nuanced, capacity-based confirmation scaling for channel closes. The changes are well-structured, with logic centralized in new files like lnwallet/confscale.go and supported by an impressive suite of new tests, including property-based tests for reorg scenarios. My review found a couple of minor areas for improvement: one related to potentially redundant code in peer/brontide.go and a naming suggestion in lnwallet/confscale_prod.go for better readability.

@Roasbeef
Copy link
Member Author

Roasbeef commented Oct 3, 2025

Looking into the itest failure, I think it's due to a change in how we notify the coop close.

@Roasbeef
Copy link
Member Author

Roasbeef commented Oct 3, 2025

Pushed up a few fix up commits.

One thing I realized is that the old blockbeat sync processing assumptions no longer apply. This is due to the fact that we won't call handleCommitSpend right after the block beat comes, as we'll now do other async operations, eg: wait for more confs. This is causing some of hte tests to fail with the old timing assumptions.

IMO, the best way to handle this may just be to add a devrpc call that we can use to force a sweep.

@saubyk saubyk added this to lnd v0.20 Oct 8, 2025
@saubyk saubyk added this to the v0.20.0 milestone Oct 8, 2025
@saubyk saubyk moved this to In progress in lnd v0.20 Oct 8, 2025
// notifications are done in two different goroutines, so the
// expected order: [receive block -> receive spend] is not
// guaranteed .
case spend, ok := <-c.fundingSpendNtfn.Spend:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like an easier and more direct way to resolve this is to listen to a conf channel instead of a spend channel? so replace c.fundingSpendNtfn with a c.fundingConfNtfn, and other logic can stay untouched?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we don't know what to listen for conf of, until we get the spend. Hence the need to request confs for each spend we get. With the way things work, if a spend confs, then is re-org'd out, then we get another spend, we'll get another notification on this same spend channel.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I wasn't been clear enough offline, what I meant is, we can just listen for the spendness of the fundingPkScript with a nil tx via

c.cfg.notifier.RegisterConfirmationsNtfn(nil, fundingPkScript, numConfs, beat.CurrentHeight())

Then we can now replace the usage of c.fundingSpendNtfn with c.fundingConfNtfn, and the change will be smaller.

case spend, ok := <-c.fundingSpendNtfn.Spend:
// If the channel was closed, then this means that the
// notifier exited, so we will as well.
spend := c.handleBlockbeat(beat)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a reorg happens, a replaced block will be sent from the blockbeat, and we can ask the subsystem to reprocess this new block; hence the reorg is handled. That was part of the initial design so we can have a single piece of logic to handle reorg, assuming the blockbeat is extended to other subsystems like funding manager or gossip. Atm the blockbeat only gives the block height data, which is very much limited, and it can easily carry more info like utxos spent so we can just use a callback query to check for spending of interested outpoints. This will have the advantage that we now have a sync, single source of truth to check for spendness. This linear flow makes the code easier to reason and maintain.

These are long-term projects tho, just want to mention since it's related to the reorg handling.

@saubyk saubyk moved this from In progress to In review in lnd v0.20 Oct 9, 2025
We have two versions: for itests, we just use one conf, but in prod,
we'll scale the number of confirmations.
This'll be useful for the set up upcoming itests.
In this commit, we add a new param that'll allow us to scale up the
number of confirmations before we act on a new close. We'll use this
later to improve the current on chain handling logic.
We wnt to add better handling, but not break any UIs or wallets. So
we'll continue to send out a notification after a single confirmation,
then send another after things are fully confirmed.
…re n is num confs

In this commit, we update the close logic to handle re-ogs up to the
final amount of confirmations. This is done generically, so we're able
to handle events such as: coop close confirm, re-org, breach confirm,
re-org, force close confirm, re-org, etc.

The upcoming set of new tests will exercise all of these cases.

We modify the block beat handling to unify the control flow. As it's
possible we get the beat, then see the spend, or the oher way around.
We'll use this for all the upcoming tests.
All the tests need to send a confirmation _after_ the spend is detected
now.
This set of new tests ensures that if have created N RBF variants of the
coop close transaction, that any of then can confirm, and be re-org'd,
with us detecting the final spend once it confirms deeploy enough.
In this commit, we add a set of generic close re-org tests. The most
important test is the property based test, they will randomly confirm
transactions, generate a re-org, then assert that eventually we dtect
the final version.
This ensures that during the RBF process, if one confirms, a re-org
occurs, then another confirms, that we'll properly detect this case.
In this commit, we add a new TriggerSwep dev rpc command. This command
can be used in the itest to trigger a sweep on demand, which will be
useful to ensure that they still pass, now that some block beat handling
is more async.
…rigger sweep if failed

This implements a generalized pattern where we'll try out assertion,
then if that fails, we'll try a sweep, then try the assertion again.
This uses the new TriggerSweep call that was added earlier.
This change is due to the fact that blockbeat handling is now more async
in the cnct.
@Roasbeef
Copy link
Member Author

PTAL @yyforyongyu. This includes the extra commits for TriggerSweep.

Copy link
Member

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think instead of RegisterSpendNtfn, we can create a single, long-lived confirmation notifier:

fundingPkScript, err := deriveFundingPkScript(c.cfg.chanState)
// ...
heightHint := c.cfg.chanState.DeriveHeightHint()
numConfs := c.requiredConfsForSpend()

fundingConfNtfn, err := c.cfg.notifier.RegisterConfirmationsNtfn(
    nil, // No specific txid, watch the script
    fundingPkScript,
    numConfs,
    heightHint,
)
if err != nil {
    // Handle error
}
defer fundingConfNtfn.Cancel()

We then add a checkFundingConfirmed to replace checkFundingSpend,

// checkFundingConfirmed performs a non-blocking read on the Confirmed
// channel to check whether the spend has been fully confirmed.
func (c *chainWatcher) checkFundingConfirmed() *chainntnfs.TxConfirmation {
    select {
    case conf, ok := <-c.fundingConfNtfn.Confirmed:
        if !ok {
            return nil
        }
        return conf
    default:
        return nil
    }
}

Which is used on handleBlockbeat to replace the old checkFundingSpend check, and everything else stays unchanged.

The main loop then just needs fewer changes,

func (c *chainWatcher) closeObserver() {
    defer c.wg.Done()

    for {
        select {
        case beat := <-c.BlockbeatChan:
            c.handleBlockbeat(beat)

        // This case handles the initial discovery of the spend,
        // or a re-orged spend.
        case update := <-fundingConfNtfn.Updates:
        	...

        // This case handles a re-org.
        case <-fundingConfNtfn.NegativeConf:
        	...

        // This case handles a confirmation that might arrive between blocks.
        case conf := <-fundingConfNtfn.Confirmed:
        	// TODO: handleCommitSpend needs to be refactored to use `conf` instead of `SpendDetail`.
            err := c.handleCommitSpend(conf)
            ...

        case <-c.quit:
            return
        }
    }
}

I think this should bring the least disruption to the system and itests (fixing the remaining itest would be a nightmare🤦🏻).

}

// If not we return a value scaled linearly
// between 3 and 6, depending on channel size.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanna note that this is no longer from 1 to 6 but 3 to 6, which makes sense.

peer/brontide.go Outdated
if closeReq != nil {
closeReq.Updates <- &ChannelCloseUpdate{
ClosingTxid: closingTxid[:],
// Determine the number of confirmations to wait before
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the formatting isn't quite right

Success: true,
go peer.WaitForChanToClose(
uint32(bestHeight), notifier, errChan, chanPoint,
&closingTxid, closingTx.TxOut[0].PkScript, 1, func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm should we also derive and pass the numConfs here?

closeReq.Updates <- &ChannelCloseUpdate{
ClosingTxid: closingTxid[:],

// Determine the number of confirmations to wait before signaling a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok i see, looks like this commit can be a fixup commit for the previous one

// notifications are done in two different goroutines, so the
// expected order: [receive block -> receive spend] is not
// guaranteed .
case spend, ok := <-c.fundingSpendNtfn.Spend:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I wasn't been clear enough offline, what I meant is, we can just listen for the spendness of the fundingPkScript with a nil tx via

c.cfg.notifier.RegisterConfirmationsNtfn(nil, fundingPkScript, numConfs, beat.CurrentHeight())

Then we can now replace the usage of c.fundingSpendNtfn with c.fundingConfNtfn, and the change will be smaller.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

chain handling channel closing Related to the closing of channels cooperatively and uncooperatively

Projects

Status: In review

Development

Successfully merging this pull request may close these issues.

3 participants