-
Notifications
You must be signed in to change notification settings - Fork 56
add live test for Nexus handoff #9024
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
With a4x2 built from cb83530 and live test built from 8032377, it takes a while, but it works:
|
/// Returns whether the given blueprint's sled configurations appear to be | ||
/// propagated to all sleds. | ||
/// | ||
/// Returns the inventory collection so that the caller can check additional | ||
/// details if wanted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Returns whether the given blueprint's sled configurations appear to be | |
/// propagated to all sleds. | |
/// | |
/// Returns the inventory collection so that the caller can check additional | |
/// details if wanted. | |
/// Verifies that the given blueprint's sled configurations appear to be | |
/// propagated to all active sleds. | |
/// | |
/// Returns the inventory collection so that the caller can check additional | |
/// details if wanted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch -- updated the comment in 34aa89a.
// Wait for the zones to be running. | ||
// (This does not mean that their Nexus instances are running.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Wait for the zones to be running. | |
// (This does not mean that their Nexus instances are running.) | |
// Wait for the zones to be running. | |
// | |
// We expect that the new Nexus zones will be blocked on the "not yet" | |
// DbMetadataNexusState, waiting for handoff. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the comment in 34aa89a.
|
||
// Make sure we're starting from a known-normal state. | ||
// First, we have an enabled target blueprint. | ||
let blueprint1 = blueprint_load_target_enabled(log, nexus) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick; WDYT about giving these more descriptive names than the enumerated "blueprint1"? I just suffered through this in #8936 , where there was a test that went up to like "blueprint12" in a multi-hundred line long test. Refactoring it to add a blueprint in the middle sucks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry about that. Sometimes I think the numbers are really helpful for keeping track of the order. In this case, at least, the names "initial", "new_nexus", "handoff", and "cleanup" are clear enough about the order. Done in 34aa89a.
// Wait for this to get propagated everywhere. | ||
let _latest_collection = blueprint_wait_sled_configs_propagated( | ||
opctx, | ||
datastore, | ||
&blueprint4, | ||
new_nexus, | ||
Duration::from_secs(120), | ||
) | ||
.await | ||
.expect("waiting for blueprint4 sled configs"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are we waiting for here? Should we be verifying that we can't access the "old Nexus quiesce" interface after this or something?
(If we have nothing else to verify, why wait?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just general test hygiene. I don't like to have tests kick off work that might not be finished when they finish running. The main reason is that this can make things flaky when you run multiple tests in a row because a second test winds up confused by the changing state left by the first test.
This PR adds a new live test for Nexus handoff.
Depends on
#9022 and#9023.TODO: