-
Notifications
You must be signed in to change notification settings - Fork 623
[Feature][APIServer v2] Support Compute Template in APIServer v2 #3959
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature][APIServer v2] Support Compute Template in APIServer v2 #3959
Conversation
Signed-off-by: machichima <[email protected]>
…r-v2-compute-template Signed-off-by: machichima <[email protected]>
Signed-off-by: machichima <[email protected]>
Signed-off-by: machichima <[email protected]>
…r-v2-compute-template Signed-off-by: machichima <[email protected]>
Signed-off-by: machichima <[email protected]>
Signed-off-by: machichima <[email protected]>
Signed-off-by: machichima <[email protected]>
Signed-off-by: machichima <[email protected]>
Signed-off-by: machichima <[email protected]>
Signed-off-by: machichima <[email protected]>
Signed-off-by: machichima <[email protected]>
Just a structure, test does not work now Signed-off-by: machichima <[email protected]>
Signed-off-by: machichima <[email protected]>
…r-v2-compute-template Signed-off-by: machichima <[email protected]>
Signed-off-by: machichima <[email protected]>
Signed-off-by: machichima <[email protected]>
Signed-off-by: machichima <[email protected]>
Signed-off-by: machichima <[email protected]>
// | ||
// Store http request handling function for unit test purpose. | ||
executeHttpRequest func(httpRequest *http.Request, URL string) ([]byte, *rpcStatus.Status, error) | ||
ExecuteHttpRequest func(httpRequest *http.Request, URL string) ([]byte, *rpcStatus.Status, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this public so that we can use in apiserversdk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@machichima, can we revert this now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we still need this to be public as we will use it in:
|
||
// Compute template | ||
func (r *ResourceManager) populateComputeTemplate(ctx context.Context, clusterSpec *api.ClusterSpec, nameSpace string) (map[string]*api.ComputeTemplate, error) { | ||
func (r *ResourceManager) PopulateComputeTemplate(ctx context.Context, clusterSpec *api.ClusterSpec, nameSpace string) (map[string]*api.ComputeTemplate, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this public so that we can use in apiserversdk
} | ||
|
||
// execCommandWithCurlInPod executes a curl command inside the specified pod's container by `kubectl exec` | ||
func (rec *RemoteExecuteClient) execCommandWithCurlInPod(pod *corev1.Pod, url string, method string, body string, contentType string) ([]byte, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take this function from apiserver/test/e2e/exec.go
and addcontentType
args
return e2etc.ctx | ||
} | ||
|
||
func (e2etc *End2EndTestingContext) GetK8sHttpClient() *kubernetes.Clientset { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally there's duplicate k8sHttpClient
and k8sClient
that are both of type *kubernetes.Clientset
. We removed k8sHttpClient
and only keep k8sClient
) | ||
|
||
// compute_template_middleware.go | ||
func NewComputeTemplateMiddleware(clientManager manager.ClientManagerInterface) func(http.Handler) http.Handler { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make clientManager
as the args so that we can mock this out in unit test
Signed-off-by: machichima <[email protected]>
Signed-off-by: machichima <[email protected]>
Or the flaky may be related to #4061 |
} | ||
|
||
// verifyPodSpecResources verifies that the PodSpec has the expected resources from the compute template | ||
func verifyPodSpecResources(t *testing.T, podSpec *corev1.PodSpec, _, groupType string, computeTemplate *api.ComputeTemplate) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is any specific reason to put a _
in the function signature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not needed, I think I forgot to remove it. Thank you for pointing this out!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in edc99ad
apiserversdk/util/template.go
Outdated
func convertRequestBodyToMap(requestBody []byte, contentType string) (map[string]any, error) { | ||
var requestMap map[string]interface{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: choose one between any
and interface{}
to make it consistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I changed interface{}
to any
as suggested in my linter in 8ad0573
apiserversdk/util/template.go
Outdated
} else { | ||
tolerations = make([]interface{}, 0) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The else branch might not be necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Removed the block in 559f2ae
apiserversdk/util/template.go
Outdated
computeTemplate, err := getComputeTemplate(context.Background(), resourceManager, headGroupMap, namespace) | ||
if err != nil { | ||
klog.Errorf("ComputeTemplate middleware: Failed to get compute template for head group: %v", err) | ||
http.Error(w, err.Error(), http.StatusInternalServerError) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this should be a client error. Returning HTTP 422 will be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix in 967c314
apiserversdk/util/template.go
Outdated
func applyComputeTemplateToRequest(computeTemplate *api.ComputeTemplate, clusterSpecMap *map[string]any, group string) { | ||
// calculate resources | ||
cpu := fmt.Sprint(computeTemplate.GetCpu()) | ||
memory := fmt.Sprintf("%d%s", computeTemplate.GetMemory(), "Gi") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to support the new memory unit field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in b56867d
|
||
// RemoteExecuteClient allows executing HTTP requests against a service running inside a Kubernetes pod | ||
// by using `kubectl exec`-style command execution, without requiring a NodePort for external access. | ||
type RemoteExecuteClient struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use the existing ProxyRoundTripper instead of introducing a RemoteExecuteClient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in 9561d29, thank you!
expectedMemory := fmt.Sprintf("%dGi", computeTemplate.GetMemory()) | ||
memoryLimit := rayContainer.Resources.Limits[corev1.ResourceMemory] | ||
memoryRequest := rayContainer.Resources.Requests[corev1.ResourceMemory] | ||
require.Equal(t, expectedMemory, memoryLimit.String(), "Expected memory limit to be 4Gi") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you remove the hard-coded values in these assertion messages in a follow-up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! no problem
|
||
klog.Infof("ComputeTemplate middleware: Successfully processed request, sending to next handler") | ||
// Update Content-Type to application/json and Content-Length header to match the new body size | ||
r.Header.Set("Content-Type", "application/json") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you keep the original content type in a follow-up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because after processing, we convert the map to JSON. Therefor I change the content-type here to be json
Why are these changes needed?
APIServer v1 introduced a unique and mandatory feature, Compute Template, which requires users to predefine the amount of CPU and memory resources they need in templates.
APIServer v2 doesn’t require users to predefine templates anymore, but we still want to continue supporting the feature to help v1 users migrate to v2 more easily.
Implementation
We need 3 new proxy handlers for RayCluster, RayJob, and RayService, respectively. They may be able to share some parts of the implementation.
Added e2e test in
compute_template_e2e_test.go
Related issue number
Checks