-
Notifications
You must be signed in to change notification settings - Fork 449
OCPBUGS-58198: Fix MCP updated machine count for image mode disabling case #5271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-58198: Fix MCP updated machine count for image mode disabling case #5271
Conversation
Skipping CI for Draft Pull Request. |
a62ba6f
to
7cc52a4
Compare
61b5b1b
to
7709548
Compare
@isabella-janssen: This pull request references Jira Issue OCPBUGS-58198, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@isabella-janssen: This pull request references Jira Issue OCPBUGS-58198, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/retest-required |
/lgtm |
return l.IsNodeDone() && l.IsDesiredMachineConfigEqualToPool(mcp) && l.IsDesiredEqualToBuild(mosc, mosb) | ||
} | ||
return l.IsNodeDone() && l.IsDesiredMachineConfigEqualToPool(mcp) | ||
return l.IsNodeDone() && l.IsDesiredMachineConfigEqualToPool(mcp) && !l.IsDesiredImageAnnotationPresentOnNode() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I thought IsNodeDone() did take image annotations into account regardless of layering? Wonder when that broke 😓
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current state of the IsNodeDone
function unfortunately does not seem to account for nodes that have not yet started updating when layering is disabled in a pool. When image mode is disabled, the first node to update removes the desired image annotation, but the remaining nodes keep the annotations as is, which leads the path trough the function to be:
if !desiredOK && !currentOK { return true }
<-- we pass over this since the annotations still exist in the nodeif desired == "" { return false }
<-- we pass over this since the annotation is still populatedif current == "" { return false }
<-- we pass over this since the annotation is still populatedreturn desired == current
<-- desired equals current since the update hasn't started yet so the annotations have not changed yet either
I'm not quite sure when exactly this flow broke, but it seems like sometime in 4.19. 🙁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh that makes sense! I must've overlooked this when I tried to re-implement LayeredNodeState
after the API move. 🤔 Or we've always never considered this 😅 Would you mind adding a unit test for this scenario in status_test.go? I feel like that's a good place for it, but open to others too. There's probably a few functions there that utilize this call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added two cases to that test here. Please let me know how you feel that coverage is, and I can adjust as needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me, thanks!
e394fb0
to
619ae36
Compare
…case & add unit test cases for scenario
619ae36
to
7c64bec
Compare
@isabella-janssen: This pull request references Jira Issue OCPBUGS-58198, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: djoshy, isabella-janssen, umohnani8 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest-required |
1 similar comment
/retest-required |
Pre-merge verified: Verified using IPI based AWS cluster:
MOSC templateoc create -f - << EOF apiVersion: machineconfiguration.openshift.io/v1 kind: MachineOSConfig metadata: name: worker spec: machineConfigPool: name: worker imageBuilder: imageBuilderType: Job renderedImagePushSecret: name: $(oc get -n openshift-machine-config-operator sa builder -ojsonpath='{.secrets[0].name}') renderedImagePushSpec: "image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/ocb-image:latest" EOF machineosconfig.machineconfiguration.openshift.io/worker created
$ oc get machineosbuilds NAME PREPARED BUILDING SUCCEEDED INTERRUPTED FAILED AGE worker-ad7f100edb9ca029c500f1f8e3fc2920 False False True False False 7m48s
/label qe-approved |
/verified by @ptalgulk01 |
@ptalgulk01: This PR has been marked as verified by In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@isabella-janssen: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
/override ci/prow/e2e-aws-ovn Known CI issues |
@yuqi-zhang: Overrode contexts on behalf of yuqi-zhang: ci/prow/e2e-aws-ovn, ci/prow/e2e-aws-ovn-upgrade In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
905b7e5
into
openshift:main
@isabella-janssen: Jira Issue Verification Checks: Jira Issue OCPBUGS-58198 Jira Issue OCPBUGS-58198 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/cherrypick release-4.20 |
@isabella-janssen: new pull request created: #5307 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Closes: OCPBUGS-58198
- What I did
This updates the layered node state
IsDone
function to properly handle the image mode disabling case. When image mode is disabling, thelayered
boolean flips tofalse
and, in that case, we need to make sure the node does not have a desired image annotation value. It also adds two unit test cases to cover the image mode disabling scenario.- How to verify it
- Description for the changelog
OCPBUGS-58198: Update the node done check to properly calculate the updated machine count when image mode is being disabled