-
Notifications
You must be signed in to change notification settings - Fork 4k
roachtest: deflake mixed-version backup/restore roachtest #155263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
roachtest: deflake mixed-version backup/restore roachtest #155263
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR deflakes mixed-version backup/restore roachtests by implementing a retry mechanism for a specific version mismatch error that occurs during the 25.4 migration when descriptors are rewritten to use a new serialization format.
- Added error detection for "version mismatch for descriptor" during restore operations
- Implemented automatic retry logic for restore operations when the specific error is encountered
- Added detailed logging and comments explaining the temporary nature of this fix
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
86c5569
to
4f69441
Compare
if err := d.testUtils.waitForJobSuccess(ctx, l, rng, jobID, internalSystemJobs); err != nil { | ||
return nil, "", err | ||
if jobErr := d.testUtils.waitForJobSuccess(ctx, l, rng, jobID, internalSystemJobs); jobErr != nil { | ||
if !strings.Contains(jobErr.Error(), "version mismatch for descriptor") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: is there a way to easily inspect the test plan to check that the 25.4 upgrade is planned for the test? If it is not planned, then the test should fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a little blurb to check for the mixed-version plan state
4f69441
to
80a6ec4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
if err := d.testUtils.waitForJobSuccess(ctx, l, rng, jobID, internalSystemJobs); err != nil { | ||
return nil, "", err | ||
if jobErr := d.testUtils.waitForJobSuccess(ctx, l, rng, jobID, internalSystemJobs); jobErr != nil { | ||
if mvHelper == nil || mvHelper.System.ToVersion.Series() != "25.4" { |
Copilot
AI
Oct 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The hardcoded version string '25.4' creates a maintenance burden. Consider extracting this to a named constant at package level (e.g., descriptorRewriteVersion = \"25.4\"
) to improve maintainability and make the version check more self-documenting.
Copilot uses AI. Check for mistakes.
bc *backupCollection, | ||
checkFiles bool, | ||
internalSystemJobs bool, | ||
mvHelper *mixedversion.Helper, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we instead pass the clusterVersion
object, and do an equality check on that? that seems cleaner than passing the whole helper and less finicky than string matching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that explicitly handling this for mixed version tests was better since this error is only harmless in the context of a mixed version state. If it were just a cluster version, you could erroneously end up ignoring the error for a 25.4 cluster that wasn't in a mixed version state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see. in that case it is fine to pass the helper. but could we do a cluster version comparison instead of string comparison?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'll investigate it further when I'm back. I wasn't sure how well version comparison would work due to differences in patch releases, but I'm sure there's a way to do ordinal release version comparison only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the AtLeast
operator is your friend: https://github.com/msbutler/cockroach/blob/master/pkg/cmd/roachtest/roachtestutil/clusterupgrade/clusterupgrade.go#L82
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I was thinking that would require an update to the test at 26.1 since we'd have to do 25.4 <= x < 26.1 (unless I constructed the 26.1 version manually for the comparison).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not following. I was thinking that we should only retry this if mvt helper plans to go through the 25.4 upgrade, i.e.:
mvHelper.System.ToVersion>=25.4 & mvHelper.System.FromVersion<25.4
As part of the 25.4 migration, all descriptors are rewritten to use the new serialization format. If this occurs during a restore, the restore will fail due to version mismatches. This test deflakes our mixed-version backup-restore roachtests to rerun the restore if it encounters that error. Fixes: cockroachdb#154611 Fixes: cockroachdb#154850 Release note: None
80a6ec4
to
7471f8b
Compare
As part of the 25.4 migration, all descriptors are rewritten to use the new serialization format. If this occurs during a restore, the restore will fail due to version mismatches. This test deflakes our mixed-version backup-restore roachtests to rerun the restore if it encounters that error.
Fixes: #154611
Fixes: #154850
Release note: None