-
Notifications
You must be signed in to change notification settings - Fork 56
(7/N) Use nexus_generation, update it #8936
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
9b1f7e3
to
86cee63
Compare
d1bd3fb
to
bf8f274
Compare
86cee63
to
91a884e
Compare
62a6819
to
30ecc07
Compare
91a884e
to
73d7258
Compare
30ecc07
to
8c0f5c9
Compare
73d7258
to
8e77874
Compare
8c0f5c9
to
a2a7fb5
Compare
8e77874
to
5578ec2
Compare
a2a7fb5
to
1067194
Compare
0b2efdd
to
9c09f60
Compare
1067194
to
bb4a47e
Compare
bb4a47e
to
df27c58
Compare
9c09f60
to
c073f52
Compare
df27c58
to
54480df
Compare
c073f52
to
1d6d7a6
Compare
6da9f39
to
c4c748f
Compare
1d6d7a6
to
997c67b
Compare
c4c748f
to
6750df3
Compare
997c67b
to
5524608
Compare
dev-tools/reconfigurator-cli/tests/output/cmds-target-release-stdout
Outdated
Show resolved
Hide resolved
match current_nexus_generation { | ||
Some(current) => write!( | ||
f, | ||
"zone gen ({zone_generation}) >= currently-running \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I haven't reviewed the planner changes yet, so maybe this question will make more sense once I do.)
Why do we report that the generation is >=
the currently-running Nexus's generation for "waiting to expunge"? I thought we'd be expunging zones with a generation <
the currently-running Nexus.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you're right - we only do want to expunge zones with a generation <
the currently-running Nexus. That's the check we're making in the planner that would need to fail to generate this planning report.
This is called from do_plan_zone_updates
in the caller, when we're iterating over out_of_date_zones
. So basically:
- We found a zone with an image that does not match what the TUF repo expects it to be (it's part of
out_of_date_zones
). - We tried to check
can_zone_be_updated
, to see if we can expunge this zone. When we did that, the generation for this observed zone looks "as new or newer" than our currently-running zone.
So, TL;DR: The context for this is report is that "the target TUF repo changed, and this zone looks out-of-date".
I think this is probably most common when the "zone we're trying to expunge" has a generation exactly equal to the active generation, but I wanted to be precise about this check. I think if the "zone we're trying to expunge" has a generation greater than the active generation, that would imply we changed TUF repos mid-update, which we've previously discussed banning anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This goes to the question I had about "what is a ZoneWaitingToExpunge". What is it we're trying to report here? It doesn't seem like there's anything wrong or blocking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is it we're trying to report here? It doesn't seem like there's anything wrong or blocking?
We are trying to report that there is a zone which has an image that does not match the target TUF repo. For almost all zones, this means "we want to expunge it". For Nexus, however, we only want to expunge it it's not actively running. The logic for this is entirely contained within can_zone_be_updated
in nexus/reconfigurator/planning/src/planner.rs
.
It doesn't seem like there's anything wrong or blocking?
Well it's not a "wrong" case for sure, but it is blocking, in IMO kinda the same way that "not enough redundancy for a service" could prevent it from being expunged. In this case, we want to expunge these Nexus zones, we just aren't ready to do so, because handoff has not finished.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well it's not a "wrong" case for sure, but it is blocking, in IMO kinda the same way that "not enough redundancy for a service" could prevent it from being expunged.
I don't think I follow this. In the "not enough redundancy" case, we block the update from proceeding until redundancy is restored. But these messages are showing up even if everything is fine and normal, and they eventually go away once we're far enough along in the update. That doesn't seem blocking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to update the text in 4ea1405, and to make this more precise (only matching "exactly" the current version).
If we feel this is still redundant with the "... remaining out-of-date zones" message, I can remove it, but I did want to communicate something in the report about why "even though you might want this zone to be updated, there is a good reason why we aren't doing it right now"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't really understand what it's trying to communicate. Maybe that's because I do think it's redundant? I think I would claim that at the point at which we've started to update zones, but before we're done with all of them, there's nothing to report about Nexus other than including it in the "remaining out-of-date zones" count. All of those zones are either "waiting to be expunged" or "waiting to be updated in place" (depending on their flavor).
Maybe there's something more specific to report once we get to the point of starting new Nexus zones and performing the handoff? But I'm not sure exactly what that would be, or how we'd see it. We only get to see the reports if a new blueprint is produced that makes some change (otherwise we throw it away), so I'd think we'd see reports for
- the blueprint that added the three new Nexus zones once all the non-Nexus zones are updated
- the blueprint that bumped the top-level Nexus generation once quiescing is done
- the blueprint (created by new Nexus) that expunges the old Nexus zones
but at none of those points are we really waiting for the old Nexus zones to be expunged, I think? Are there more blueprints coming out during the handoff beyond those three?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd be most interested in this report if:
- All zones are updated, except for Nexus
- According to the planner input, the old Nexuses are still running
- They generate a blueprint saying "we aren't modifying anything, but there are still out-of-date zones"
In this case, I think it's important to identify that "Even though the system has not converged to an updated state, we are waiting for something -- this is what, and this is why".
Otherwise, we'd see: "blueprint with no changes, even though we know some zones are out-of-date"
Would it be more clear if I limited the scope of this report to "only emit this record if Nexus is the only out-of-date zone"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only get to see the reports if a new blueprint is produced that makes some change (otherwise we throw it away)
do planning inputs only propagate if something else in the blueprint changes?
If that's true, then I think we could fail to communicate the ZoneUnsafeToShutdown
reports too - these reports also identify why "we might not be making any changes to your blueprint, even though there is work to be done"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored this a bit in 9843b4a:
- This minimizes the window when we report about Nexus zones, until they're the only remaining zone to be updated
- This removes all the noise from the reconfigurator-cli tests
- I explicitly added this case to the test suite, where we try to perform an update from an old Nexus, and see that we're blocked. We also see the report in this case, which will identify why we aren't proceeding.
.collect(); | ||
|
||
// Define image sources: A (same as existing Nexus) and B (new) | ||
let image_source_a = BlueprintZoneImageSource::InstallDataset; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This kinda strengthens my fear that deciding generation based on image sources is a little fraught; two Nexus zones on two different sleds both with the source InstallDataset
might be completely unrelated. (Presumably they aren't in practice! And it'd be very surprising. But an image source of InstallDataset
means we'd need to check the zone manifest in inventory to know whether the zone image sources actually match.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm, are you suggesting I handle this case, or that the nexus_generation logic would be fraught regardless?
This is what we said about this in RFD 588:
When creating a new Nexus instance, the planner picks the nexus_generation as follows:
If the new instance’s image matches that of any existing Nexus instance, expunged or otherwise, use the same blueprint generation as that instance. (This should always match the current blueprint generation or the next one (during an upgrade).)
If the new instance’s image does not match that of any existing Nexus instance, choose one generation number higher than all existing instances. (This should always match the current blueprint generation plus one.)
The intuition here is that a new generation is defined by deploying a new Nexus image. If there’s a new Nexus image, we’ll need a handoff. If not, we don’t.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I kind of feel like you shouldn't be able to add a new one while any of them has source InstallDataset
? I feel like we need to wait for image resolution to finish.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think my concern is that the RFD bit you quoted talks in terms of "matching" images. InstallDataset
alone does not give enough information to know whether it matches another sled's InstallDataset
image (even of the same zone type). I think given we block zone updates until all zone image sources are known (or will, once #8921 lands), we should also gate doing anything related to Nexus handoff on the same thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't yet looked at the builder or the planner but I see the PR is changing a lot so I wanted to get these comments in. Will circle back to those files next.
* 0 zones waiting to be expunged: | ||
* zone 466a9f29-62bf-4e63-924a-b9efdb86afec (nexus): zone gen (1) >= currently-running Nexus gen (1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems confusing -- it sounds like we're trying to expunge these zones because they're at a newer generation?
It also seems like these messages come and go over the next several steps.
I think this is probably just a problem with the text, not the behavior. I'm not sure what it's trying to communicate though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to update the text in 4ea1405.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, made this report only appear when it's more relevant - see: 9843b4a
* skipping noop zone image source check on sled d81c6a84-79b8-4958-ae41-ea46c9b19763: all 6 zones are already from artifacts | ||
* 1 pending MGS update: | ||
* model0:serial0: RotBootloader(PendingMgsUpdateRotBootloaderDetails { expected_stage0_version: ArtifactVersion("0.0.1"), expected_stage0_next_version: NoValidVersion }) | ||
* only placed 0/2 desired nexus zones |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's going on with these? I only skimmed the test so far but I think at this point in the test we're expecting to be doing an update, but we can't proceed because there are pending MGS updates. Why is it saying it only placed 0/2 desired Nexus zones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, after doing a little digging, I think this is kinda nasty, and probably should be considered related to #8921.
At this point in the test, there are three sleds:
- One has nexus @ InstallDataset
- One has nexus @ 1.0.0
- One has nexus @ 2.0.0
This should normally not be possible, but I believe is occurring because the three sleds have been manually edited. They also appear to all have been constructed with "nexus_generation = 1", which matches the blueprint-level "nexus_generation".
(This would be flagged by blippy as a corrupt blueprint - using the same generation for different images - but I don't think anyone is checking in this test)
This means we give really weird input to the planner here: We say that all three Nexuses are active, because they're all in-service sleds running the desired version of Nexus.
Without the #8921 changes, we will try to proceed placing discretionary zones, even though there is an InstallDataset present. As a part of this, we find the "currently in-charge Nexus image", which happens to find InstallDataset
first.
Then, the planner tries to ensure that the "currently in-charge Nexus image" has sufficient redundancy. It sees that one sled has (Nexus, InstallDataset)
, and wants "two more Nexuses at this image" to reach the target. This is where the "2" in "0/2" comes from.
Next, the planner calls add_discretionary_zones
, but the place_zone
logic prevents zones of the same type from being placed on the same sled. Basically, all sleds report "I have a nexus already (ignoring the image source), so you can't place a new nexus here". This means we place zero new Nexuses.
This triggers the report.out_of_eligible_sleds
call, which creates this message in the output.
I think the "pending MGS updates" logic would prevent zones from being updated, but this logic is happening in the context of "trying to restore redundancy, when the planner thinks we're running at reduced capacity" -- this is entirely in the domain of "zone add", not "zone update".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be super explicit: This "three-different-versions-at-once" behavior is also on main:
omicron/dev-tools/reconfigurator-cli/tests/output/cmds-mupdate-update-flow-stdout
Lines 1697 to 1841 in c3ea904
> blueprint-show latest | |
blueprint 8f2d1f39-7c88-4701-aa43-56bf281b28c1 | |
parent: ce365dff-2cdb-4f35-a186-b15e20e1e700 | |
sled: 2b8f0cb3-0295-4b3c-bc58-4fe88b57112c (active, config generation 5) | |
host phase 2 contents: | |
------------------------ | |
slot boot image source | |
------------------------ | |
A current contents | |
B current contents | |
physical disks: | |
------------------------------------------------------------------------------------ | |
vendor model serial disposition | |
------------------------------------------------------------------------------------ | |
fake-vendor fake-model serial-72c59873-31ff-4e36-8d76-ff834009349a in service | |
datasets: | |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | |
dataset name dataset id disposition quota reservation compression | |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | |
oxp_72c59873-31ff-4e36-8d76-ff834009349a/crucible 8c4fa711-1d5d-4e93-85f0-d17bff47b063 in service none none off | |
oxp_72c59873-31ff-4e36-8d76-ff834009349a/crypt/clickhouse 3b66453b-7148-4c1b-84a9-499e43290ab4 in service none none off | |
oxp_72c59873-31ff-4e36-8d76-ff834009349a/crypt/external_dns 841d5648-05f0-47b0-b446-92f6b60fe9a6 in service none none off | |
oxp_72c59873-31ff-4e36-8d76-ff834009349a/crypt/internal_dns 3560dd69-3b23-4c69-807d-d673104cfc68 in service none none off | |
oxp_72c59873-31ff-4e36-8d76-ff834009349a/crypt/zone 4829f422-aa31-41a8-ab73-95684ff1ef48 in service none none off | |
oxp_72c59873-31ff-4e36-8d76-ff834009349a/crypt/zone/oxz_clickhouse_353b3b65-20f7-48c3-88f7-495bd5d31545 318fae85-abcb-4259-b1b6-ac96d193f7b7 in service none none off | |
oxp_72c59873-31ff-4e36-8d76-ff834009349a/crypt/zone/oxz_crucible_bd354eef-d8a6-4165-9124-283fb5e46d77 2ad1875a-92ac-472f-8c26-593309f0e4da in service none none off | |
oxp_72c59873-31ff-4e36-8d76-ff834009349a/crypt/zone/oxz_crucible_pantry_ad6a3a03-8d0f-4504-99a4-cbf73d69b973 c31623de-c19b-4615-9f1d-5e1daa5d3bda in service none none off | |
oxp_72c59873-31ff-4e36-8d76-ff834009349a/crypt/zone/oxz_external_dns_6c3ae381-04f7-41ea-b0ac-74db387dbc3a b46de15d-33e7-4cd0-aa7c-e7be2a61e71b in service none none off | |
oxp_72c59873-31ff-4e36-8d76-ff834009349a/crypt/zone/oxz_internal_dns_99e2f30b-3174-40bf-a78a-90da8abba8ca 09b9cc9b-3426-470b-a7bc-538f82dede03 in service none none off | |
oxp_72c59873-31ff-4e36-8d76-ff834009349a/crypt/zone/oxz_nexus_466a9f29-62bf-4e63-924a-b9efdb86afec 775f9207-c42d-4af2-9186-27ffef67735e in service none none off | |
oxp_72c59873-31ff-4e36-8d76-ff834009349a/crypt/zone/oxz_ntp_62620961-fc4a-481e-968b-f5acbac0dc63 2db6b7c1-0f46-4ced-a3ad-48872793360e in service none none off | |
oxp_72c59873-31ff-4e36-8d76-ff834009349a/crypt/debug 93957ca0-9ed1-4e7b-8c34-2ce07a69541c in service 100 GiB none gzip-9 | |
omicron zones: | |
--------------------------------------------------------------------------------------------------------------- | |
zone type zone id image source disposition underlay IP | |
--------------------------------------------------------------------------------------------------------------- | |
clickhouse 353b3b65-20f7-48c3-88f7-495bd5d31545 install dataset in service fd00:1122:3344:102::23 | |
crucible bd354eef-d8a6-4165-9124-283fb5e46d77 install dataset in service fd00:1122:3344:102::26 | |
crucible_pantry ad6a3a03-8d0f-4504-99a4-cbf73d69b973 install dataset in service fd00:1122:3344:102::25 | |
external_dns 6c3ae381-04f7-41ea-b0ac-74db387dbc3a install dataset in service fd00:1122:3344:102::24 | |
internal_dns 99e2f30b-3174-40bf-a78a-90da8abba8ca install dataset in service fd00:1122:3344:1::1 | |
internal_ntp 62620961-fc4a-481e-968b-f5acbac0dc63 install dataset in service fd00:1122:3344:102::21 | |
nexus 466a9f29-62bf-4e63-924a-b9efdb86afec install dataset in service fd00:1122:3344:102::22 | |
sled: 98e6b7c2-2efa-41ca-b20a-0a4d61102fe6 (active, config generation 7) | |
host phase 2 contents: | |
------------------------ | |
slot boot image source | |
------------------------ | |
A current contents | |
B current contents | |
physical disks: | |
------------------------------------------------------------------------------------ | |
vendor model serial disposition | |
------------------------------------------------------------------------------------ | |
fake-vendor fake-model serial-c6d33b64-fb96-4129-bab1-7878a06a5f9b in service | |
datasets: | |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | |
dataset name dataset id disposition quota reservation compression | |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | |
oxp_c6d33b64-fb96-4129-bab1-7878a06a5f9b/crucible 43931274-7fe8-4077-825d-dff2bc8efa58 in service none none off | |
oxp_c6d33b64-fb96-4129-bab1-7878a06a5f9b/crypt/external_dns a4c3032e-21fa-4d4a-b040-a7e3c572cf3c in service none none off | |
oxp_c6d33b64-fb96-4129-bab1-7878a06a5f9b/crypt/internal_dns 4f60b534-eaa3-40a1-b60f-bfdf147af478 in service none none off | |
oxp_c6d33b64-fb96-4129-bab1-7878a06a5f9b/crypt/zone 4617d206-4330-4dfa-b9f3-f63a3db834f9 in service none none off | |
oxp_c6d33b64-fb96-4129-bab1-7878a06a5f9b/crypt/zone/oxz_crucible_5199c033-4cf9-4ab6-8ae7-566bd7606363 ad41be71-6c15-4428-b510-20ceacde4fa6 in service none none off | |
oxp_c6d33b64-fb96-4129-bab1-7878a06a5f9b/crypt/zone/oxz_crucible_pantry_ba4994a8-23f9-4b1a-a84f-a08d74591389 1bca7f71-5e42-4749-91ec-fa40793a3a9a in service none none off | |
oxp_c6d33b64-fb96-4129-bab1-7878a06a5f9b/crypt/zone/oxz_external_dns_803bfb63-c246-41db-b0da-d3b87ddfc63d 3ac089c9-9dec-465b-863a-188e80d71fb4 in service none none off | |
oxp_c6d33b64-fb96-4129-bab1-7878a06a5f9b/crypt/zone/oxz_internal_dns_427ec88f-f467-42fa-9bbb-66a91a36103c 686c19cf-a0d7-45f6-866f-c564612b2664 in service none none off | |
oxp_c6d33b64-fb96-4129-bab1-7878a06a5f9b/crypt/zone/oxz_nexus_0c71b3b2-6ceb-4e8f-b020-b08675e83038 793ac181-1b01-403c-850d-7f5c54bda6c9 in service none none off | |
oxp_c6d33b64-fb96-4129-bab1-7878a06a5f9b/crypt/zone/oxz_ntp_6444f8a5-6465-4f0b-a549-1993c113569c cdf3684f-a6cf-4449-b9ec-e696b2c663e2 in service none none off | |
oxp_c6d33b64-fb96-4129-bab1-7878a06a5f9b/crypt/debug 248c6c10-1ac6-45de-bb55-ede36ca56bbd in service 100 GiB none gzip-9 | |
omicron zones: | |
----------------------------------------------------------------------------------------------------------------------- | |
zone type zone id image source disposition underlay IP | |
----------------------------------------------------------------------------------------------------------------------- | |
crucible 5199c033-4cf9-4ab6-8ae7-566bd7606363 artifact: version 1.0.0 in service fd00:1122:3344:101::25 | |
crucible_pantry ba4994a8-23f9-4b1a-a84f-a08d74591389 artifact: version 1.0.0 in service fd00:1122:3344:101::24 | |
external_dns 803bfb63-c246-41db-b0da-d3b87ddfc63d artifact: version 1.0.0 in service fd00:1122:3344:101::23 | |
internal_dns 427ec88f-f467-42fa-9bbb-66a91a36103c artifact: version 1.0.0 in service fd00:1122:3344:2::1 | |
internal_ntp 6444f8a5-6465-4f0b-a549-1993c113569c artifact: version 1.0.0 in service fd00:1122:3344:101::21 | |
nexus 0c71b3b2-6ceb-4e8f-b020-b08675e83038 artifact: version 1.0.0 in service fd00:1122:3344:101::22 | |
sled: d81c6a84-79b8-4958-ae41-ea46c9b19763 (active, config generation 6) | |
host phase 2 contents: | |
------------------------ | |
slot boot image source | |
------------------------ | |
A current contents | |
B current contents | |
physical disks: | |
------------------------------------------------------------------------------------ | |
vendor model serial disposition | |
------------------------------------------------------------------------------------ | |
fake-vendor fake-model serial-4930954e-9ac7-4453-b63f-5ab97c389a99 in service | |
datasets: | |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | |
dataset name dataset id disposition quota reservation compression | |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | |
oxp_4930954e-9ac7-4453-b63f-5ab97c389a99/crucible 090bd88d-0a43-4040-a832-b13ae721f74f in service none none off | |
oxp_4930954e-9ac7-4453-b63f-5ab97c389a99/crypt/external_dns 4da74a5b-6911-4cca-b624-b90c65530117 in service none none off | |
oxp_4930954e-9ac7-4453-b63f-5ab97c389a99/crypt/internal_dns 252ac39f-b9e2-4697-8c07-3a833115d704 in service none none off | |
oxp_4930954e-9ac7-4453-b63f-5ab97c389a99/crypt/zone 45cd9687-20be-4247-b62a-dfdacf324929 in service none none off | |
oxp_4930954e-9ac7-4453-b63f-5ab97c389a99/crypt/zone/oxz_crucible_f55647d4-5500-4ad3-893a-df45bd50d622 1cb0a47a-59ac-4892-8e92-cf87b4290f96 in service none none off | |
oxp_4930954e-9ac7-4453-b63f-5ab97c389a99/crypt/zone/oxz_crucible_pantry_75b220ba-a0f4-4872-8202-dc7c87f062d0 b1deff4b-51df-4a37-9043-afbd7c70a1cb in service none none off | |
oxp_4930954e-9ac7-4453-b63f-5ab97c389a99/crypt/zone/oxz_external_dns_f6ec9c67-946a-4da3-98d5-581f72ce8bf0 c65a9c1c-36dc-4ddb-8aac-ec3be8dbb209 in service none none off | |
oxp_4930954e-9ac7-4453-b63f-5ab97c389a99/crypt/zone/oxz_internal_dns_ea5b4030-b52f-44b2-8d70-45f15f987d01 21fd4f3a-ec31-469b-87b1-087c343a2422 in service none none off | |
oxp_4930954e-9ac7-4453-b63f-5ab97c389a99/crypt/zone/oxz_nexus_3eeb8d49-eb1a-43f8-bb64-c2338421c2c6 e009d8b8-4695-4322-b53f-f03f2744aef7 in service none none off | |
oxp_4930954e-9ac7-4453-b63f-5ab97c389a99/crypt/zone/oxz_ntp_f10a4fb9-759f-4a65-b25e-5794ad2d07d8 41071985-1dfd-4ce5-8bc2-897161a8bce4 in service none none off | |
oxp_4930954e-9ac7-4453-b63f-5ab97c389a99/crypt/debug 7a6a2058-ea78-49de-9730-cce5e28b4cfb in service 100 GiB none gzip-9 | |
omicron zones: | |
----------------------------------------------------------------------------------------------------------------------- | |
zone type zone id image source disposition underlay IP | |
----------------------------------------------------------------------------------------------------------------------- | |
crucible f55647d4-5500-4ad3-893a-df45bd50d622 artifact: version 2.0.0 in service fd00:1122:3344:103::25 | |
crucible_pantry 75b220ba-a0f4-4872-8202-dc7c87f062d0 artifact: version 2.0.0 in service fd00:1122:3344:103::24 | |
external_dns f6ec9c67-946a-4da3-98d5-581f72ce8bf0 artifact: version 2.0.0 in service fd00:1122:3344:103::23 | |
internal_dns ea5b4030-b52f-44b2-8d70-45f15f987d01 artifact: version 2.0.0 in service fd00:1122:3344:3::1 | |
internal_ntp f10a4fb9-759f-4a65-b25e-5794ad2d07d8 artifact: version 2.0.0 in service fd00:1122:3344:103::21 | |
nexus 3eeb8d49-eb1a-43f8-bb64-c2338421c2c6 artifact: version 2.0.0 in service fd00:1122:3344:103::22 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's my consolation, after chatting with @sunshowers :
This is only happening because of the following config option:
> # Set the add_zones_with_mupdate_override planner config to ensure that zone
> # adds happen despite zone image sources not being Artifact.
> set planner-config --add-zones-with-mupdate-override true
planner config updated:
* add zones with mupdate override: false -> true
If this switch wasn't set, and we mupdated into this situation, we would normally refuse to do any add/update operations at all, while in this forcefully corrupt state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. @sunshowers, should we turn off this flag at this point in the test? (Is it weird that most of this test runs with that flag on?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on this question; I know when I've had to update this test I haven't understood the changes I've made as well as I'd like. It's a pretty intricate test though so I'm not sure what to suggest for making it clearer.
match current_nexus_generation { | ||
Some(current) => write!( | ||
f, | ||
"zone gen ({zone_generation}) >= currently-running \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This goes to the question I had about "what is a ZoneWaitingToExpunge". What is it we're trying to report here? It doesn't seem like there's anything wrong or blocking?
.collect(); | ||
let second_sled_id = sled_ids[1]; | ||
|
||
// Use a different image source (artifact vs install dataset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I wonder if this should produce an error if any Nexus zones have source "install dataset"? Because we don't know that the image is actually different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to defer to the conversation in #8936 (comment) - I'm down to restrict the usage of the install dataset, but doing this requires patching tests that rain changed in #8921.
Once that merges, I'm willing to add more explicit checks, but none of the "zone add" logic in the planner should execute anyway if there are any InstallDataset
zones in use.
hash: ArtifactHash([0x42; 32]), | ||
}; | ||
|
||
// Add another Nexus zone with different image source - should increment generation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since test_nexus_generation_assignment_new_generation()
already tested that if you pass a particular generation to sled_add_zone_nexus()
then you get a zone with that generation, I wonder if we should just have this test call determine_nexus_generation()
with both cases: one with the same image and one with a new one. I don't think we really need to add the zones and check that they got what we expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed this test out of necessity, with moving determine_nexus_generation
out of the planner.
(Updated in 8d17221)
/// - If any existing Nexus zone has the same image source, reuse its generation | ||
/// - Otherwise, use the highest existing generation + 1 | ||
/// - If no existing zones exist, return an error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do feel like this probably belongs in the planner. There are no callers within the builder.
.collect(); | ||
|
||
// Define image sources: A (same as existing Nexus) and B (new) | ||
let image_source_a = BlueprintZoneImageSource::InstallDataset; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I kind of feel like you shouldn't be able to add a new one while any of them has source InstallDataset
? I feel like we need to wait for image resolution to finish.
let zones_currently_updating = | ||
self.get_zones_not_yet_propagated_to_inventory(); | ||
if !zones_currently_updating.is_empty() { | ||
info!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little surprising there's no update to report
for this. But maybe that happens elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a problem on main too. Filed #9047.
// For Nexus, we need to confirm that the active generation has | ||
// moved beyond this zone. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling this function for Nexus feels a little weird. For other zones, I think we're saying: "we're about to update this one zone, either by expunging it [and then subsequently adding one] or else replacing it. Can we do that now?" For Nexus, though, this will only return true
after we've done the handoff to the new fleet.
I'm not sure it's worth reworking. Assuming not, I'd clarify this comment a bit:
// For Nexus, we need to confirm that the active generation has | |
// moved beyond this zone. | |
// For Nexus, we're only ready to "update" this zone once control has been handed off to a newer generation of Nexus zones. (Once that happens, we're not really going to update this zone, just expunge it.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored this a bit, but done in 4beb420.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a few last suggestions but this is looking good to me! Given how tricky this is, I'd like to get @jgallagher's +1 too.
let (nexus_updateable_zones, non_nexus_updateable_zones): ( | ||
Vec<_>, | ||
Vec<_>, | ||
) = out_of_date_zones | ||
.into_iter() | ||
.filter(|(_, zone, _)| { | ||
self.are_zones_ready_for_updates(mgs_updates) | ||
&& self.can_zone_be_shut_down_safely(&zone, &mut report) | ||
}) | ||
.partition(|(_, zone, _)| zone.zone_type.is_nexus()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take it or leave it: I'm wondering if we can make should_nexus_zone_be_expunged()
more type-safe (so it can't panic on bad input) with something like (this is untested):
let (nexus_updateable_zones, non_nexus_updateable_zones): ( | |
Vec<_>, | |
Vec<_>, | |
) = out_of_date_zones | |
.into_iter() | |
.filter(|(_, zone, _)| { | |
self.are_zones_ready_for_updates(mgs_updates) | |
&& self.can_zone_be_shut_down_safely(&zone, &mut report) | |
}) | |
.partition(|(_, zone, _)| zone.zone_type.is_nexus()); | |
let (nexus_updateable_zones, non_nexus_updateable_zones): ( | |
Vec<_>, | |
Vec<_>, | |
) = out_of_date_zones | |
.into_iter() | |
.filter(|(_, zone, _)| { | |
self.are_zones_ready_for_updates(mgs_updates) | |
&& self.can_zone_be_shut_down_safely(&zone, &mut report) | |
}) | |
.map(|(sled_id, zone, image_source)| { | |
let nexus_config = match &zone.z_type { | |
blueprint_zone_type::Nexus(nexus_config) => nexus_config, | |
_ => None, | |
}; | |
(sled_id, zone, image_source, nexus_config) | |
}) | |
.partition(|(_, _, _, nexus_config)| nexus_config.is_none()); |
Then we'll already have the nexus_config
and can pass it to should_nexus_zone_be_expunged()
. Not a big deal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be more on-board for this, but I think the type of nexus_config
would be wrapped in an option here, which would need to be unwrapped anyway?
// identify why it is not ready for update. | ||
fn can_zone_be_updated( | ||
// | ||
// Precondition: zone must be a Nexus zone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Precondition: zone must be a Nexus zone | |
// Precondition: zone must be a Nexus zone and be running an out-of-date image |
(trying to communicate that this shouldn't be used in any case other than "update")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updating phrasing in e1c9f06
* skipping noop zone image source check on sled d81c6a84-79b8-4958-ae41-ea46c9b19763: all 6 zones are already from artifacts | ||
* 1 pending MGS update: | ||
* model0:serial0: RotBootloader(PendingMgsUpdateRotBootloaderDetails { expected_stage0_version: ArtifactVersion("0.0.1"), expected_stage0_next_version: NoValidVersion }) | ||
* only placed 0/2 desired nexus zones |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. @sunshowers, should we turn off this flag at this point in the test? (Is it weird that most of this test runs with that flag on?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be in a follow-on PR, but I feel like we should update this test to finish the update now. (If it's easy, it'd be nice to get in this PR -- it'd be a tidy confirmation that all is working.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in e1c9f06 - looks like it works!
(though it looks like it also was updated in #9059 by @jgallagher )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM too, thanks! Left a handful of minor nits and clarifying questions, but nothing blocking.
.unwrap(); | ||
let image_source = BlueprintZoneImageSource::InstallDataset; | ||
|
||
// Add first Nexus zone - should get generation 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two nitpicky questions:
- Is it that it should get generation 1, or that it should get
builder.parent_blueprint().nexus_generation
since that's what we're passing in? - If the latter, do we still need this test now that the method isn't doing any logic to pick a generation and is just using whatever we pass in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I first cleaned up this test to clarify "the output matches the input", but I agree with you - I think it's a bit silly to have a test that the value is just a pass-through.
Removed in e1c9f06
|
||
// We may need to bump the top-level Nexus generation number | ||
// to update Nexus zones. | ||
let nexus_generation_bump = self.do_plan_nexus_generation_update()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be guarded by any of the "are prior steps still pending" checks like we have on updating zones above? My gut feeling is that in a normal update this wouldn't matter (we wouldn't be in a position for do_plan_nexus_generation_update()
to do anything unless everything else is already done anyway). Maybe there are some weird cases where it might come up? Sled addition at a particularly unlucky time or something?
I think the answer is "no, don't guard it" - if we're in a position where we're ready to trigger a handoff, we should probably go ahead and do that even if the planner thinks there's now something else to do "earlier" in the update process, and the new Nexus can finish up whatever is left? But it seems like a weird enough case I wanted to ask.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that inside the body of do_plan_nexus_generation_update
, we're already guarding against the stuff we care about, and validate that these things have propagated to inventory (e.g., new Nexuses booting) if we care about it.
I just scanned through do_plan_nexus_generation_update
- it seems like any spot where we need to validate some property iterating over zones, it checks both the blueprint and the pending changes in the sled_editors
.
I believe that, in the case where we're adding a zone and then performing this check:
- We'll be able to see the newly planned zone (e.g., looking that all zones are on a newer image)
- We won't see it in inventory, and could raise an error if we need to see it there (e.g., needing sufficient new Nexuses to actually be running)
// | ||
// This presumably includes the currently-executing Nexus where | ||
// this logic is being considered. | ||
let Some(current_gen) = self.lookup_current_nexus_generation()? else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should lookup_current_nexus_generation()
return a Result<Generation, _>
instead of a Result<Option<Generation>, _>
? Or a different question: Can we get Ok(None)
here from a well-formed parent blueprint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, this actually simplifies the space of possible reported states if we treat it like an error. Done in aef9586
panic!("did not converge after {MAX_PLANNING_ITERATIONS} iterations"); | ||
} | ||
|
||
struct BlueprintGenerator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This work is already done so I wouldn't change it in this PR, but I don't love this - it feels like it's replicating what we built reconfigurator-cli
to do for the tests it enables (set up a system, generate TUF repos, walk through various scenarios, etc., in a way that's easier to read and maintain than unit tests). Do you think we could replace these tests with reconfigurator-cli
-based ones, or are they poking at things that would be hard to do there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be happy to take this on as a follow-up to this PR. Candidly I didn't realize reconfigurator-cli was flexible enough to support blueprint editing like we do in tests, but I can take a look at that now.
* skipping noop zone image source check on sled d81c6a84-79b8-4958-ae41-ea46c9b19763: all 6 zones are already from artifacts | ||
* 1 pending MGS update: | ||
* model0:serial0: RotBootloader(PendingMgsUpdateRotBootloaderDetails { expected_stage0_version: ArtifactVersion("0.0.1"), expected_stage0_next_version: NoValidVersion }) | ||
* only placed 0/2 desired nexus zones |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on this question; I know when I've had to update this test I haven't understood the changes I've made as well as I'd like. It's a pretty intricate test though so I'm not sure what to suggest for making it clearer.
Blueprint Planner
Fixes #8843, #8854