-
Notifications
You must be signed in to change notification settings - Fork 48
Address stuck virtual media #679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Skipping CI for Draft Pull Request. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test ? |
@akrzos: The following commands are available to trigger required jobs:
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/test deploy-sno-self-sched |
Seems it failed on timeout:
Would be nice if it didn't obscure what the timeout is. |
/test deploy-sno-self-sched |
cc @josecastillolema What is the prow timeout for the self-schedule job? |
/test deploy-sno |
1 similar comment
/test deploy-sno |
/test deploy-mno |
Im back! Let's see:
@akrzos I think the default timeout per step is 2 hours |
/test deploy-sno |
Again:
Maybe we should flip the order? |
The error in this specific case looks like a race condition as there are no retries waiting for the IDRAC to become ready which means that the reboot didn't even start when the ready check ran. I sent openshift/release#68061 trying to address this issue. |
/test deploy-sno |
@akrzos: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Any ideas? |
I'd say this means the URL didn't return a json at the time the task was executed. Let try to re-run to confirm it's a race and we can make the task more robust, e.g extending the until condition with cluster.json is defined |
/test deploy-sno |
Looks like the issue didn't reproduce this time. I sent #681 to extend the wait condition. |
/test deploy-mno |
I was thinking, in order to test both paths of this PR, once we get both the "normal" deploy-sno and deploy-mno working (happy path) we could just mount a virtual media in the jetlag CI cluster, wdyt @akrzos ? |
3969d7d
to
1a768ed
Compare
1a768ed
to
838ac45
Compare
@josecastillolema any further in testing and validating this? |
838ac45
to
90c1c71
Compare
Testing it from: openshift/release#67885 |
d7eb2f9
to
f7eec43
Compare
Hi @josecastillolema What is the status of your testing of this PR? |
In the future, feel free to not wait for me to determine if the patch ran or not. I looked at the output and searched for the newly added tasks and found this: TASK [boot-iso : Dell - Eject any CD Virtual Media] ****************************
Thursday X1 August X0X5 1X:13:XX +0000 (0:00:15.X81) 0:05:35.878 *******
fatal: [fXX-h11-000-rX30.rduX.XXXXXXXX.redhat.com]: FAILED! => {"accept_ranges": "bytes", "cache_control": "no-cache", "changed": false, "connection": "close", "content": "{\"error\":{\"@Message.ExtendedInfo\":[{\"Message\":\"No Virtual Media devices are currently connected.\",\"MessageArgs\":[],\"[email protected]\":0,\"MessageId\":\"IDRAC.1.X.VRM0009\",\"RelatedProperties\":[],\"[email protected]\":0,\"Resolution\":\"No response action is required.\",\"Severity\":\"Critical\"},{\"Message\":\"The request failed due to an internal service error. The service is still operational.\",\"MessageArgs\":[],\"[email protected]\":0,\"MessageId\":\"Base.1.X.InternalError\",\"RelatedProperties\":[],\"[email protected]\":0,\"Resolution\":\"Resubmit the request. If the problem persists, consider resetting the service.\",\"Severity\":\"Critical\"}],\"code\":\"Base.1.X.GeneralError\",\"message\":\"A general error has occurred. See ExtendedInfo for more information\"}}\n", "content_length": "77X", "content_type": "application/json;odata.metadata=minimal;charset=utf-8", "date": "Thu, X1 Aug X0X5 19:13:50 GMT", "elapsed": 8, "json": {"error": {"@Message.ExtendedInfo": [{"Message": "No Virtual Media devices are currently connected.", "MessageArgs": [], "[email protected]": 0, "MessageId": "IDRAC.1.X.VRM0009", "RelatedProperties": [], "[email protected]": 0, "Resolution": "No response action is required.", "Severity": "Critical"}, {"Message": "The request failed due to an internal service error. The service is still operational.", "MessageArgs": [], "[email protected]": 0, "MessageId": "Base.1.X.InternalError", "RelatedProperties": [], "[email protected]": 0, "Resolution": "Resubmit the request. If the problem persists, consider resetting the service.", "Severity": "Critical"}], "code": "Base.1.X.GeneralError", "message": "A general error has occurred. See ExtendedInfo for more information"}}, "msg": "Status code was 500 and not [X0X]: HTTP Error 500: Internal Server Error", "odata_version": "X.0", "redirected": false, "server": "iDRAC/8", "status": 500, "strict_transport_security": "max-age=X307X000", "url": "https://mgmt-fXX-h1X-000-rX30.rduX.XXXXXXXX.redhat.com/redfish/v1/Managers/iDRAC.Embedded.1/VirtualMedia/CD/Actions/VirtualMedia.EjectMedia", "vary": "Accept-Encoding", "x_frame_options": "SAMEORIGIN"}
TASK [boot-iso : Force mount of a existing image] ******************************
Thursday X1 August X0X5 1X:13:50 +0000 (0:00:08.X78) 0:05:XX.557 *******
changed: [fXX-h11-000-rX30.rduX.XXXXXXXX.redhat.com -> mgmt-fXX-h1X-000-rX30.rduX.XXXXXXXX.redhat.com] => {"changed": true, "rc": 0, "stderr": "Warning: Permanently added 'mgmt-fXX-h1X-000-rX30.rduX.XXXXXXXX.redhat.com' (ECDSA) to the list of known hosts.\r\nShared connection to mgmt-fXX-h1X-000-rX30.rduX.XXXXXXXX.redhat.com closed.\r\n", "stderr_lines": ["Warning: Permanently added 'mgmt-fXX-h1X-000-rX30.rduX.XXXXXXXX.redhat.com' (ECDSA) to the list of known hosts.", "Shared connection to mgmt-fXX-h1X-000-rX30.rduX.XXXXXXXX.redhat.com closed."], "stdout": "Remote Image is now Configured\r\n", "stdout_lines": ["Remote Image is now Configured"]}
TASK [boot-iso : Force unmount of the existing image] **************************
Thursday X1 August X0X5 1X:13:5X +0000 (0:00:05.89X) 0:05:50.XX9 *******
changed: [fXX-h11-000-rX30.rduX.XXXXXXXX.redhat.com -> mgmt-fXX-h1X-000-rX30.rduX.XXXXXXXX.redhat.com] => {"changed": true, "rc": 0, "stderr": "Shared connection to mgmt-fXX-h1X-000-rX30.rduX.XXXXXXXX.redhat.com closed.\r\n", "stderr_lines": ["Shared connection to mgmt-fXX-h1X-000-rX30.rduX.XXXXXXXX.redhat.com closed."], "stdout": "Disable Remote File Started. Please check status using -s\r\noption to know Remote File Share is ENABLED or DISABLED.\r\n", "stdout_lines": ["Disable Remote File Started. Please check status using -s", "option to know Remote File Share is ENABLED or DISABLED."]} I see the initial failure for task |
Then this PR lgtm! I think we are good to merge |
status_code: 204 | ||
return_content: yes | ||
|
||
# # Eject just the found image |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we delete this commented block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, only left it in there if there was a case we needed it but I guess years later we still didn't use it.
f7eec43
to
23b93e0
Compare
23b93e0
to
e3b8a5c
Compare
No description provided.