- 
                Notifications
    You must be signed in to change notification settings 
- Fork 255
HIVE-2302, HIVE-2644: Pass metadata.json through opaquely #2729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
HIVE-2302, HIVE-2644: Pass metadata.json through opaquely #2729
Conversation
| @2uasimojo: This pull request references HIVE-2302 which is a valid jira issue. In response to this: 
 Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. | 
| Skipping CI for Draft Pull Request. | 
| @2uasimojo: This pull request references HIVE-2302 which is a valid jira issue. In response to this: 
 Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. | 
| [APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: 2uasimojo The full list of commands accepted by this bot can be found here. The pull request process is described here 
Needs approval from an approver in each of these files:
 
 Approvers can indicate their approval by writing  | 
| TODO: Deprovisioner side | 
8707a79    to
    d09e43e      
    Compare
  
    | @2uasimojo: This pull request references HIVE-2302 which is a valid jira issue. In response to this: 
 Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. | 
    
      
        1 similar comment
      
    
  
    | @2uasimojo: This pull request references HIVE-2302 which is a valid jira issue. In response to this: 
 Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. | 
d09e43e    to
    13ad458      
    Compare
  
    13ad458    to
    ace243e      
    Compare
  
    | @2uasimojo: This pull request references HIVE-2302 which is a valid jira issue. This pull request references HIVE-2644 which is a valid jira issue. In response to this: 
 Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. | 
| @2uasimojo: This pull request references HIVE-2302 which is a valid jira issue. This pull request references HIVE-2644 which is a valid jira issue. In response to this: 
 Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. | 
ace243e    to
    cf8610a      
    Compare
  
    | /test e2e e2e-azure e2e-gcp e2e-vsphere e2e-openstack 🤞 | 
cf8610a    to
    268d7cc      
    Compare
  
    | /test e2e-azure e2e-vsphere | 
268d7cc    to
    bce1629      
    Compare
  
    | 
 /test e2e-vsphere | 
bce1629    to
    cb1b5d5      
    Compare
  
    | /hold for moar testings | 
| 
 ...so I think I'll just rebase to pick up that last one anyway. ...but I might as well wait until #2754 lands. | 
cb1b5d5    to
    f6f2464      
    Compare
  
    | /assign @dlom | 
| Legacy granny switch validated via #2759 ✓ | 
| /retest hive-on-pull-request | 
| /retest hive-mce-26-on-pull-request | 
f6f2464    to
    8158d73      
    Compare
  
    | /retest hive-on-pull-request | 
| /retest hive-mce-26-on-pull-request | 
| @dlom: The  Use  In response to this: 
 Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. | 
| /test e2e-openstack Pre-actual-test IPI setup flake | 
| /test e2e-openstack Same again | 
| /test e2e-openstack Looks like provision timed out? | 
| } | ||
|  | ||
| // TODO: Make a registry or interface for this | ||
| var ConfigureCreds func(client.Client) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine, it's readable and makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack. But it would jell with how we're getting the correct platform-specific destroyer from the installer -- see LL132 & 137 below. I don't mind leaving it for now though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most I would consider would be something like this:
type ConfigureCredsFn func(client.Client)
...
var configureCreds ConfigureCredsFnThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this gain? I still have to define the funcs with the full param spec, right?
| // If env vars are unset, the destroyer will fail organically. | ||
| ConfigureCreds = func(c client.Client) { | ||
| nutanixutil.ConfigureCreds(c) | ||
| metadata.Nutanix.Username = os.Getenv(constants.NutanixUsernameEnvVar) | ||
| metadata.Nutanix.Password = os.Getenv(constants.NutanixPasswordEnvVar) | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // If env vars are unset, the destroyer will fail organically. | |
| ConfigureCreds = func(c client.Client) { | |
| nutanixutil.ConfigureCreds(c) | |
| metadata.Nutanix.Username = os.Getenv(constants.NutanixUsernameEnvVar) | |
| metadata.Nutanix.Password = os.Getenv(constants.NutanixPasswordEnvVar) | |
| } | |
| // If env vars are unset, the destroyer will fail organically. | |
| ConfigureCreds = nutanixutil.ConfigureCreds | |
| metadata.Nutanix.Username = os.Getenv(constants.NutanixUsernameEnvVar) | |
| metadata.Nutanix.Password = os.Getenv(constants.NutanixPasswordEnvVar) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can just write it like this. The function is immediately invoked right below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That won't work without some refactoring, because ConfigureCreds is where those env vars get set, for both nutanix and vsphere.
Looking at it again, I think what makes sense is to pass the metadata into ConfigureCreds and set up the user/pass there. They are, after all, Creds that need to be Configured :)
My only objection to that is that I was trying to leave the legacy code path untouched as much as possible. Ultimately -- when we kill the legacy code path -- it'll be clean to do so; but in the meantime it would require me to either pass a dummy metadata is through or add a nil check condition in ConfigureCreds. The latter seems like the lesser evil. But I'm not sure it's worthwhile -- WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think adding a metadata parameter and using a nil check to all of the ConfigureCreds implementations is a good compromise, especially if the line count is low (probably no more than +10 per file)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, via a separate commit in case we hate it :)
8158d73    to
    c4a6689      
    Compare
  
    | } | ||
|  | ||
| ConfigureCreds(c) | ||
| ConfigureCreds[platform](c, metadata) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love it
| } | ||
| if mdjsr := cd.Spec.ClusterMetadata.MetadataJSONSecretRef; mdjsr != nil && mdjsr.Name != "" { | ||
| if err := r.addOwnershipToSecret(cd, cdLog, mdjsr.Name); err != nil { | ||
| return reconcile.Result{}, err | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @2uasimojo I have a question regarding the cleanup of the metadata.json Secret. Please correct me if I missed anything, thank you!
Currently, the OwnerReference of the metadata.json Secret is only set when the cluster installation succeeds. If the installation fails and the Secret has already been created, it has no OwnerReference and will remain in the cluster when the ClusterDeployment is deleted - it won’t be removed automatically.
Regarding the leftover Secret, there is a potential issue:
If an invalid metadata.json Secret already exists before the upgrade, the legacy retrofit path calls ensureMetadataJSONSecret() with forceUpdate=false. In this case, if the Secret already exists, its contents will not be updated even if the metadata in the retrofit has changed. This can cause the cluster deletion to fail. (I think it works with the annotation.)
Would it be possible to set the OwnerReference when creating the Secret? This way, the Secret would be automatically cleaned up regardless of whether the installation succeeds.
As additional context:
I did some tests switching repeatedly between old and new hive versions on the same hub.
When a ClusterDeployment installation failed on the new version, the metadata.json Secret was left behind without cleanup.
After reverting to the old version and creating a successful ClusterDeployment in the same namespace, upgrading hive again caused the existing secret to remain outdated (since forceUpdate=false), resulting in a mismatch between clusterdeprovision data and metadata.json, which led to cluster deletion failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great find @huangmingxia!
I'm really not worried about supporting "downgrades". (I think we could also simulate this behavior by creating a dummy Secret of the appropriate name before kicking off the installation with the new build. I'm also not worried about supporting that :) )
However, I agree we should be setting that OwnerReference immediately to avoid the scenario where the Secret is "orphaned" if the installation fails. PR incoming...
Well, mostly. Previously any time installer added a field to metadata.json, we would need to evaluate and possibly add a bespoke field and code path for it to make sure it was supplied to the destroyer at deprovision time. With this change, we're offloading metadata.json verbatim (except in some cases we have to scrub/replace credentials fields -- see HIVE-2804 / openshift#2612) to a new Secret in the ClusterDeployment's namespace, referenced from a new field: ClusterDeployment.Spec.ClusterMetadata.MetadataJSONSecretRef. For legacy clusters -- those created before this change -- we attempt to retrofit the new Secret based on the legacy fields. This is best effort and may not always work. In the future (but not here!) instead of building the installer's ClusterMetadata structure for the destroyer with individual fields from the CD's ClusterMetadata, we'll unmarshal it directly from the contents of this Secret.
An earlier commit ensures that ClusterDeployments have an associated Secret containing the metadata.json emitted by the installer. This change adds a new generic destroyer via the (existing) `hiveutil deprovision` command that consumes this metadata.json to deprovision the cluster. This new behavior is the default, but we also include an escape hatch to run the platform-specific legacy destroyer by setting the following annotation on the ClusterDeployment: `hive.openshift.io/legacy-deprovision: "true"`
A couple of providers (nutanix, vsphere) need bespoke code to populate credentials in the metadata.json object for destroying a cluster. In a prior commit this was being done in the deprovisioner (the new one, that uses metadata.json directly, per HIVE-2302) after ConfigureCreds. Since ConfigureCreds is where we (stay with me) configure creds, and is already platform-specific, it makes more sense to do this work there. This commit refactors to do so. Legacy code paths pass in a `nil` metadata object, which is coded to result in no change from the previous functionality. (In particular, ConfigureCreds is also used when provisioning, where no metadata object is present/necessary.)
c951181    to
    de21622      
    Compare
  
    | @2uasimojo: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. | 
Previously any time installer added a field to metadata.json, we would need to evaluate and possibly add a bespoke field and code path for it to make sure it was supplied to the destroyer at deprovision time.
With this change, we're offloading metadata.json verbatim (except in some cases we have to scrub/replace credentials fields -- see HIVE-2804 / #2612) to a new Secret in the ClusterDeployment's namespace, referenced from a new field: ClusterDeployment.Spec.ClusterMetadata.MetadataJSONSecretRef.
For legacy clusters -- those created before this change -- we attempt to retrofit the new Secret based on the legacy fields. This is best effort and may not always work.
This change then adds a new generic destroyer via the (existing)
hiveutil deprovisioncommand that consumes this metadata.json to deprovision the cluster.This new behavior is the default, but we also include an escape hatch to run the platform-specific legacy destroyer by setting the following annotation on the ClusterDeployment:
hive.openshift.io/legacy-deprovision: "true"