Skip to content

Conversation

@sbodagala
Copy link
Contributor

The unicast recovery algorithm should ensure that all versions starting from "max(KCV)" onwards are included in the "unknownCommittedVersions" list. The current implementation allows the version whose prevVersion is equal to "max(KCV)" to be skipped (

if (!(prevVersion == maxKCV || prevVersion == prevVersionMap[version])) {
) (and potentially a bunch of versions right after "max(KCV)") and could cause the algorithm to pick an incorrect recovery version. This PR corrects that check.

NOTE: A good unit test for this function "getRecoverVersionUnicast()" could have found this issue. But it could be tedious to write such a test because it requires the population of data structures like LogSets - we should think more about this.

Testing:
Found by a simulation test (multiple simulation tests found this issue, I think. Didn't look in-depth into all of them). Verified that they all succeed over this change. Also, multiple Joshua jobs run over this change (with version vector enabled) didn't show any similar failures.

Joshua job (with version vector disabled):

Code-Reviewer Section

The general pull request guidelines can be found here.

Please check each of the following things and check all boxes before accepting a PR.

  • The PR has a description, explaining both the problem and the solution.
  • The description mentions which forms of testing were done and the testing seems reasonable.
  • Every function/class/actor that was touched is reasonably well documented.

For Release-Branches

If this PR is made against a release-branch, please also check the following:

  • This change/bugfix is a cherry-pick from the next younger branch (younger release-branch or main if this is the youngest branch)
  • There is a good reason why this PR needs to go into a release branch and this reason is documented (either in the description above or in a linked GitHub issue)

starting from "max(KCV)" onwards. The current implementation allows
the version whose prevVersion is equal to "max(KCV)" to be skipped
and could cause the algorithm to pick an incorrect recovery version.
@sbodagala sbodagala requested review from dlambrig and jzhou77 October 3, 2025 20:42
@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-ide on Linux RHEL 9

  • Commit ID: 05cbb0b
  • Duration 0:25:45
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: 05cbb0b
  • Duration 0:39:44
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: 05cbb0b
  • Duration 0:49:03
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: 05cbb0b
  • Duration 0:53:56
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: 05cbb0b
  • Duration 1:02:44
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: 05cbb0b
  • Duration 1:06:27
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: 05cbb0b
  • Duration 1:10:19
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

Copy link
Contributor

@dlambrig dlambrig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created radar rdar://162067957 (create unit test for getRecoverVersionUnicast())

@jzhou77 jzhou77 merged commit 9c48e3c into apple:main Oct 9, 2025
7 checks passed
@sbodagala sbodagala deleted the version-vector-recovery-issue branch October 9, 2025 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants