Skip to content

Conversation

diegomrsantos
Copy link
Contributor

@diegomrsantos diegomrsantos commented Aug 28, 2025

Issue Addressed

This PR addresses a bug in QBFT commit message handling where commit messages are dropped when received before a proposal, preventing consensus achievement in out-of-order message scenarios.

Proposed Changes

  • Add comprehensive test test_consensus_with_commits_before_proposal that verifies QBFT can achieve consensus when commit messages arrive before proposals
  • The test demonstrates the expected behavior: nodes should achieve consensus when they have both a valid proposal and commit quorum, regardless of message arrival order
  • This ensures QBFT provides proper message ordering independence as required by distributed consensus protocols

Additional Info

The current implementation drops commit messages when proposal_accepted_for_current_round is false. The fix involves implementing proper message buffering and re-evaluation when proposals arrive.

Test Scenario:

  1. Node receives commit messages from other nodes (commit quorum available)
  2. Node later receives the corresponding proposal
  3. Node achieves consensus with the committed value

This ensures network resilience by allowing nodes to properly participate in consensus even when experiencing temporary network issues or joining late.

@diegomrsantos diegomrsantos force-pushed the fix/qbft-commit-messages-without-proposal branch from 236a243 to e702cf6 Compare August 28, 2025 18:50
@diegomrsantos diegomrsantos changed the title Fix QBFT commit message handling in catch-up scenarios fix(qbft): commit message handling in catch-up scenarios Aug 28, 2025
@diegomrsantos diegomrsantos changed the base branch from stable to release-v0.3.0 August 28, 2025 18:51
Add failing test that demonstrates QBFT incorrectly drops commit messages
when no proposal has been accepted for the current round. This prevents
proper catch-up scenarios where nodes should be able to achieve consensus
based on commit quorum even without seeing the original proposal.

The test shows that commit messages are dropped in the commit message
handling logic, which should buffer these messages instead for later
processing when a proposal arrives.
@diegomrsantos diegomrsantos force-pushed the fix/qbft-commit-messages-without-proposal branch from e702cf6 to fb89af0 Compare August 28, 2025 19:07
@Zacholme7
Copy link
Member

So we should just store the commit in the container before checking if the proposal was accepted?

@diegomrsantos
Copy link
Contributor Author

So we should just store the commit in the container before checking if the proposal was accepted?

From what I understand, we can receive the messages out of order, so we need to buffer the commit messages in case we receive the proposal later. Do you think it makes sense?

@diegomrsantos
Copy link
Contributor Author

The QBFT Specification Requirement

According to the QBFT specification, if:

  1. We have a valid proposal for a value
  2. We receive a quorum of commit messages for that same value

Then we should accept the value as committed, regardless of the order these messages arrived.

Why Buffering is Necessary

In distributed systems, messages can arrive out of order:

Scenario 1 (Normal Order):

  1. Receive proposal for value X
  2. Accept proposal
  3. Receive commit messages for value X
  4. Achieve commit quorum → Success ✅

Scenario 2 (Out of Order - Catch-up):

  1. Receive commit messages for value X first (from other nodes)
  2. Receive proposal for value X later
  3. Accept proposal
  4. Re-evaluate buffered commits → Achieve commit quorum → Success ✅

Current Anchor Bug

The current code at lines 748-750:

if !self.proposal_accepted_for_current_round {
    warn!(from=?operator_id, ?self.state, "Have not accepted Proposal for current round yet");
    return; // ❌ DROPS commit messages instead of buffering
}

This violates the QBFT specification because it prevents Scenario 2 from working correctly.

Correct Implementation Should:

  1. Always buffer commit messages (regardless of proposal status)
  2. When a proposal is accepted, re-evaluate all buffered commit messages
  3. Check for commit quorum using both new and previously buffered commits

This ensures that message ordering doesn't affect consensus correctness, which is a fundamental requirement for robust distributed consensus protocols like QBFT.

@diegomrsantos diegomrsantos force-pushed the fix/qbft-commit-messages-without-proposal branch from 5502512 to 5b159ee Compare August 29, 2025 09:21
@diegomrsantos diegomrsantos self-assigned this Aug 29, 2025
Replace the incomplete buffering test with a comprehensive test that
verifies complete QBFT consensus behavior in out-of-order scenarios.

The new test_consensus_with_commits_before_proposal() test:
- Sends commit messages before proposal (realistic catch-up scenario)
- Verifies actual consensus achievement with correct value
- Tests complete distributed systems behavior rather than just buffering

This provides better validation of QBFT message ordering independence
and ensures the fix addresses the complete consensus protocol behavior.
@diegomrsantos diegomrsantos force-pushed the fix/qbft-commit-messages-without-proposal branch from 5b159ee to 45cbdaa Compare August 29, 2025 11:16
@Zacholme7
Copy link
Member

Zacholme7 commented Aug 29, 2025

Okay good job, I did some more looking into this and it does appear to be a valid issue. This is spec compliance, but anchor and go-ssv both do not implement this. I think our first step would be to bring it up to them in the discord and coordinate a change. I dont think this change would be huge, but it does require refactoring our some pieces of our logic and thinking through every case where this is needed.

We are just doing the same thing as go-ssv atm

@diegomrsantos
Copy link
Contributor Author

Okay good job, I did some more looking into this and it does appear to be a valid issue. This is spec compliance, but anchor and go-ssv both do not implement this. I think our first step would be to bring it up to them in the discord and coordinate a change. I dont think this change would be huge, but it does require refactoring our some pieces of our logic and thinking through every case where this is needed.

We are just doing the same thing as go-ssv atm

I agree, let's do it

@Zacholme7
Copy link
Member

Think we should close these for now and make an issue for each linking these prs?

@diegomrsantos
Copy link
Contributor Author

Think we should close these for now and make an issue for each linking these prs?

But thinking more about it, what would be the issue if Anchor fixes it while Go doesn't?

@Zacholme7
Copy link
Member

I dont think there is an issue, but atm I think our focus should be making sure we are 100% confident in our implementations correctness with respect to go-ssv and then coordinate these extra changes.

@diegomrsantos
Copy link
Contributor Author

diegomrsantos commented Aug 29, 2025

I dont think there is an issue, but atm I think our focus should be making sure we are 100% confident in our implementations correctness with respect to go-ssv and then coordinate these extra changes.

Shouldn't we be correct wrt to the spec? If this is really an issue that happens in production, although probably not often, an Anchor only cluster could have worse performance. But I agree it might not be high in the priority. @jking-aus @AgeManning what are your thoughts?

@Zacholme7
Copy link
Member

Are we good to close these and just put them into an issue?

@diegomrsantos
Copy link
Contributor Author

Are we good to close these and just put them into an issue?

Should we raise this on discord?

@Zacholme7
Copy link
Member

opened issue #614

@Zacholme7 Zacholme7 closed this Sep 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants