-
Notifications
You must be signed in to change notification settings - Fork 63
postStart hook commands timeout #1440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
e90b773
to
c342798
Compare
I tried the abovementioned steps and I was able to see probelematic workspace failing with
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: akurinnoy, rohanKanojia The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Oleksii Kurinnyi <[email protected]>
…rate_all Signed-off-by: Oleksii Kurinnyi <[email protected]>
Signed-off-by: Oleksii Kurinnyi <[email protected]>
…ds generate_all Signed-off-by: Oleksii Kurinnyi <[email protected]>
Signed-off-by: Oleksii Kurinnyi <[email protected]>
Signed-off-by: Oleksii Kurinnyi <[email protected]>
Signed-off-by: Oleksii Kurinnyi <[email protected]>
Signed-off-by: Oleksii Kurinnyi <[email protected]>
… commands Signed-off-by: Oleksii Kurinnyi <[email protected]>
Signed-off-by: Oleksii Kurinnyi <[email protected]>
…rt hook commands Signed-off-by: Oleksii Kurinnyi <[email protected]>
Signed-off-by: Oleksii Kurinnyi <[email protected]>
Signed-off-by: Oleksii Kurinnyi <[email protected]>
61d8918
to
85046e5
Compare
New changes are detected. LGTM label has been removed. |
Signed-off-by: Oleksii Kurinnyi <[email protected]>
85046e5
to
3b0e379
Compare
Signed-off-by: Oleksii Kurinnyi <[email protected]>
Signed-off-by: Oleksii Kurinnyi <[email protected]>
cf174a0
to
2813408
Compare
…disabled Signed-off-by: Oleksii Kurinnyi <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1440 +/- ##
==========================================
+ Coverage 39.57% 40.00% +0.42%
==========================================
Files 160 160
Lines 13186 13333 +147
==========================================
+ Hits 5219 5334 +115
- Misses 7590 7622 +32
Partials 377 377 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
@akurinnoy thank you for the update, but I get the fallback status message when starting
instead of the expected message:
Is it expected? |
pkg/library/lifecycle/poststart.go
Outdated
} | ||
return handler, nil | ||
} | ||
|
||
// processCommandsForPostStart builds a lifecycle handler that runs the provided command(s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// processCommandsForPostStart builds a lifecycle handler that runs the provided command(s) | |
// processCommandsWithoutTimeoutFallback builds a lifecycle handler that runs the provided command(s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/library/lifecycle/poststart.go
Outdated
|
||
// processCommandsForPostStart processes a list of DevWorkspace commands | ||
// and generates a corev1.LifecycleHandler for the PostStart lifecycle hook. | ||
func processCommandsForPostStart(commands []dw.Command, postStartTimeout *int32) (*corev1.LifecycleHandler, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, could we define the caller functions above the helper functions? ie,
processCommandsForPostStart(...)
processCommandsWithoutTimeoutFallback(...)
buildUserScripot(...)
generateScriptWithTimeout(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
…out is disabled Signed-off-by: Oleksii Kurinnyi <[email protected]>
@dkwon17 Hi,
I couldn't find any proof that this is expected behavior in the Kubernetes docs, but it seems to be the case. I also encountered this behavior while I was testing this PR. I ran the |
/retest |
1 similar comment
/retest |
pkg/library/lifecycle/poststart.go
Outdated
return "", fmt.Errorf("exec command is nil for command ID %s", command.Id) | ||
} | ||
if len(execCmd.Env) > 0 { | ||
return "", fmt.Errorf("env vars in postStart command %s are unsupported", command.Id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm missing something obvious, but why are env vars in the postStart command not supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, there's nothing here that prevents us from passing env variables to the postStart command. I fixed this issue.
// +kubebuilder:validation:Optional | ||
// +kubebuilder:validation:Type=integer | ||
// +kubebuilder:validation:Format=int32 | ||
PostStartTimeout *int32 `json:"postStartTimeout,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, could we change the type from *int32
to string
so that it is consistent with the other timeouts?
devworkspace-operator/apis/controller/v1alpha1/devworkspaceoperatorconfig_types.go
Lines 144 to 153 in cf4c38d
// IdleTimeout determines how long a workspace should sit idle before being | |
// automatically scaled down. Proper functionality of this configuration property | |
// requires support in the workspace being started. If not specified, the default | |
// value of "15m" is used. | |
IdleTimeout string `json:"idleTimeout,omitempty"` | |
// ProgressTimeout determines the maximum duration a DevWorkspace can be in | |
// a "Starting" or "Failing" phase without progressing before it is automatically failed. | |
// Duration should be specified in a format parseable by Go's time package, e.g. | |
// "15m", "20s", "1h30m", etc. If not specified, the default value of "5m" is used. | |
ProgressTimeout string `json:"progressTimeout,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense; I have changed the timeout type.
Signed-off-by: Oleksii Kurinnyi <[email protected]>
Signed-off-by: Oleksii Kurinnyi <[email protected]>
Signed-off-by: Oleksii Kurinnyi <[email protected]>
434a115
to
2d2ad37
Compare
Signed-off-by: Oleksii Kurinnyi <[email protected]>
/retest |
@akurinnoy: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
After more testing, I noticed that this DW fails when the postStartTimeout is set, but succeeds when it is not set:
@akurinnoy are you able to reproduce the problem? |
What does this PR do?
This PR addresses the issue of
postStart
hook failures in DevWorkspaces when hook commands not exiting within the timeout period, so that the workspace pod gets stuck inTerminating
state and never gets deleted.This PR resolves the issue by:
timeout
forpostStart
hook. User-provided commands are now wrapped with thetimeout
utility. This ensures thatpostStart
hook commands are terminated if they exceed a configurable duration. The timeout duration can be set in theDevWorkspaceOperatorConfig
(a value of 0 means no timeout):What issues does this PR fix or reference?
https://issues.redhat.com/browse/CRW-8329
Is it tested? How?
DevWorkspaceOperatorConfig
with thepostStart
hook timeout duration (in seconds):DevWorkspace
designed to have itspostStart
hook time out:Failed
phase.status.message
of the DevWorkspace should provide a reason for the failure, indicating a timeout. For example:Error creating DevWorkspace deployment: Container tools has state [postStart hook] Commands terminated by SIGTERM (likely timed out after 30s). Exit code 143.
PR Checklist
/test v8-devworkspace-operator-e2e, v8-che-happy-path
to trigger)v8-devworkspace-operator-e2e
: DevWorkspace e2e testv8-che-happy-path
: Happy path for verification integration with Che