Skip to content

Conversation

kev-cao
Copy link
Contributor

@kev-cao kev-cao commented Oct 10, 2025

As part of the 25.4 migration, all descriptors are rewritten to use the new serialization format. If this occurs during a restore, the restore will fail due to version mismatches. This test deflakes our mixed-version backup-restore roachtests to rerun the restore if it encounters that error.

Fixes: #154611

Fixes: #154850

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@kev-cao kev-cao requested a review from Copilot October 10, 2025 18:30
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR deflakes mixed-version backup/restore roachtests by implementing a retry mechanism for a specific version mismatch error that occurs during the 25.4 migration when descriptors are rewritten to use a new serialization format.

  • Added error detection for "version mismatch for descriptor" during restore operations
  • Implemented automatic retry logic for restore operations when the specific error is encountered
  • Added detailed logging and comments explaining the temporary nature of this fix

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@kev-cao kev-cao force-pushed the roachtest/deflake-mixed-version-backup branch from 86c5569 to 4f69441 Compare October 10, 2025 20:26
if err := d.testUtils.waitForJobSuccess(ctx, l, rng, jobID, internalSystemJobs); err != nil {
return nil, "", err
if jobErr := d.testUtils.waitForJobSuccess(ctx, l, rng, jobID, internalSystemJobs); jobErr != nil {
if !strings.Contains(jobErr.Error(), "version mismatch for descriptor") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is there a way to easily inspect the test plan to check that the 25.4 upgrade is planned for the test? If it is not planned, then the test should fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a little blurb to check for the mixed-version plan state

@kev-cao kev-cao added the backport-25.4.x Flags PRs that need to be backported to 25.4 label Oct 15, 2025
@kev-cao kev-cao force-pushed the roachtest/deflake-mixed-version-backup branch from 4f69441 to 80a6ec4 Compare October 15, 2025 22:11
@kev-cao kev-cao requested a review from Copilot October 15, 2025 22:12
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

if err := d.testUtils.waitForJobSuccess(ctx, l, rng, jobID, internalSystemJobs); err != nil {
return nil, "", err
if jobErr := d.testUtils.waitForJobSuccess(ctx, l, rng, jobID, internalSystemJobs); jobErr != nil {
if mvHelper == nil || mvHelper.System.ToVersion.Series() != "25.4" {
Copy link

Copilot AI Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The hardcoded version string '25.4' creates a maintenance burden. Consider extracting this to a named constant at package level (e.g., descriptorRewriteVersion = \"25.4\") to improve maintainability and make the version check more self-documenting.

Copilot uses AI. Check for mistakes.

bc *backupCollection,
checkFiles bool,
internalSystemJobs bool,
mvHelper *mixedversion.Helper,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we instead pass the clusterVersion object, and do an equality check on that? that seems cleaner than passing the whole helper and less finicky than string matching.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that explicitly handling this for mixed version tests was better since this error is only harmless in the context of a mixed version state. If it were just a cluster version, you could erroneously end up ignoring the error for a 25.4 cluster that wasn't in a mixed version state.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see. in that case it is fine to pass the helper. but could we do a cluster version comparison instead of string comparison?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'll investigate it further when I'm back. I wasn't sure how well version comparison would work due to differences in patch releases, but I'm sure there's a way to do ordinal release version comparison only.

Copy link
Collaborator

@msbutler msbutler Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I was thinking that would require an update to the test at 26.1 since we'd have to do 25.4 <= x < 26.1 (unless I constructed the 26.1 version manually for the comparison).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not following. I was thinking that we should only retry this if mvt helper plans to go through the 25.4 upgrade, i.e.:
mvHelper.System.ToVersion>=25.4 & mvHelper.System.FromVersion<25.4

As part of the 25.4 migration, all descriptors are rewritten to use the
new serialization format. If this occurs during a restore, the restore
will fail due to version mismatches. This test deflakes our
mixed-version backup-restore roachtests to rerun the restore if it
encounters that error.

Fixes: cockroachdb#154611

Fixes: cockroachdb#154850

Release note: None
@kev-cao kev-cao force-pushed the roachtest/deflake-mixed-version-backup branch from 80a6ec4 to 7471f8b Compare October 17, 2025 16:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-25.4.x Flags PRs that need to be backported to 25.4

Projects

None yet

Development

Successfully merging this pull request may close these issues.

roachtest: backup-restore/mixed-version failed roachtest: backup-restore/mixed-version failed

3 participants