-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Process apiGroup in capi provider #8410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -375,22 +375,44 @@ func (r unstructuredScalableResource) InstanceDRADriver() string { | |
} | ||
|
||
func (r unstructuredScalableResource) readInfrastructureReferenceResource() (*unstructured.Unstructured, error) { | ||
obKind := r.unstructured.GetKind() | ||
obName := r.unstructured.GetName() | ||
|
||
infraref, found, err := unstructured.NestedStringMap(r.unstructured.Object, "spec", "template", "spec", "infrastructureRef") | ||
if !found || err != nil { | ||
return nil, nil | ||
} | ||
|
||
apiversion, ok := infraref["apiVersion"] | ||
if !ok { | ||
return nil, nil | ||
var apiversion string | ||
|
||
apiGroup, ok := infraref["apiGroup"] | ||
if ok { | ||
if apiversion, err = getAPIGroupPreferredVersion(r.controller.managementDiscoveryClient, apiGroup); err != nil { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If I see correctly this is doing a live call against the apiserver. I'm wondering if 1 live call for every call of readInfrastructureReferenceResource is too much Should we use a cache with a TTL to cache the apiGroup => version mapping? (ttl: 1m or 10m?) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point. Initially, I think this api is invoked only during scale up/down. @elmiko any advice where to put the cache? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If it's okay to always do a live call here because this isn't called too often, absolutely fine for me of course (I just don't know :)) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. these calls will only happen when the core autoscaler wants to construct a node template. if the autoscaler has a ready node from the node group, then it will use a node as a template instead of asking the provider to generate a new template (where this function is called). in the worst case scenario, this function will get called once per node group per scan interval from the autoscaler, which defaults to 10 seconds. in a large cluster this could be called several time for the same template depending on how the cluster-api resources are organized. i think it's worth investigating putting a cache in for the infrastructure templates as they probably won't change that frequently and it could save us some api calls. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds like we don't necessarily need caching. If I see correctly the getInfrastructureResource below is also not cached? So this won't add much on top There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. getInfrastructureResource enables informer's cache. |
||
klog.V(4).Infof("Unable to read preferred version from api group %s, error: %v", apiGroup, err) | ||
return nil, err | ||
} | ||
apiversion = fmt.Sprintf("%s/%s", apiGroup, apiversion) | ||
} else { | ||
// Fall back to ObjectReference in capi v1beta1 | ||
apiversion, ok = infraref["apiVersion"] | ||
if !ok { | ||
info := fmt.Sprintf("Missing apiVersion from %s %s's InfrastructureReference", obKind, obName) | ||
klog.V(4).Info(info) | ||
return nil, errors.New(info) | ||
} | ||
} | ||
|
||
kind, ok := infraref["kind"] | ||
if !ok { | ||
return nil, nil | ||
info := fmt.Sprintf("Missing kind from %s %s's InfrastructureReference", obKind, obName) | ||
klog.V(4).Info(info) | ||
return nil, errors.New(info) | ||
} | ||
name, ok := infraref["name"] | ||
if !ok { | ||
return nil, nil | ||
info := fmt.Sprintf("Missing name from %s %s's InfrastructureReference", obKind, obName) | ||
klog.V(4).Info(info) | ||
return nil, errors.New(info) | ||
} | ||
// kind needs to be lower case and plural | ||
kind = fmt.Sprintf("%ss", strings.ToLower(kind)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have to adjust the infrastructureRef here to use the new format of v1beta2 (i.e. apiGroup instead of apiVersion)
(please also check if there are other cases below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current test structure is not so flexible. For this case, it covers the machineset with apiVersion case, e.g. the previous behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe then we should use v1beta1 here. As it is it is not a valid v1beta2 object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have some thoughts about how to make these tests easier to work with, but best if we get this review done first then perhaps i can propose some cleanups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if I got it right, but v1beta2 MachineDeployments, MachineSets and MachinePools always have apiGroup. How could they have apiVersion?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line is the apiVersion for the kind that is being created, for this clause it's a MachineSet. this isn't about the infrastructure ref.
edit: misread your comment @sbueringer
i agree with your suggestion that we use
v1beta1
here.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just chatted with Jun. While it looks trivial to just change this apiVersion to v1beta1 here it breaks a huge amount of tests and requires refactoring
(57 tests failed, 143 tests passed when changing this to v1beta1)
So from my side it would be okay to defer the test refactoring if we feel the change in this PR is sufficiently unit tested.
But I leave this to autoscaler reviewers / maintainers of course
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume we're talking about test refactoring under the cloudprovider/clusterapi directory, which is in fact maintained independently from the core autoscaler. So I'll defer to @elmiko for final call on merging w/ v1beta2 change.
Overall lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, i was talking about refactoring the tests for clusterapi. one of things i'd like to do is replace the brittle test config functions with a better interface modeled around a builder/fluent pattern for creating the capi objects.