-
Notifications
You must be signed in to change notification settings - Fork 1.6k
KEP-5328: Node Capabilities #5347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Welcome @pravk03! |
Hi @pravk03. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
59e7e54
to
4719180
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dom4ha @sanposhiho @macsko - FYI
4c11e06
to
9254f9b
Compare
/cc @tallclair @yujuhong |
9254f9b
to
f8291a4
Compare
/sig scheduling |
Those are all examples of FG-related capabilities. Not the generic long-term capabilities. |
5fb093d
to
a3e1436
Compare
It seems like most of the concerns with this are around the specific capabilities being added, but this KEP doesn't actually propose adding any capabilities. The examples given are hypothetical examples based on features currently in development, but no new features will be able to depend on capabilities until it goes to beta. This creates a bit of a chicken-and-egg situation, where it's hard to point to exactly how capabilities will be used until we have users lined up, but we can't line up users yet. |
we kind of need to know what will be expected use cases. Maybe past examples or hypothetical examples thought thru end-to-end. Right now this KEP is limited to just set of name/value pairs and a scenario of FG discoverability. But already we are thinking there MAY be need to support capabilities for node selection, ability to declare tolerations for capabilities, ability to have node-restricted capabilities. Knowing the scope would help to understand if API proposed is needed (among alternatives if the set of use cases is limited) and if needed, what shape should it have. |
a3e1436
to
f069f62
Compare
8d6230d
to
cd6d67e
Compare
I have tried to address these the Case Study section. |
I feel like we've discussed these options in depth already. Yes, these are all somewhat hypothetical because we've had to work around them in other ways. I'm sure we can dig up more examples from past KEPs, but is that necessary? Capabilities that are not limited to just feature gates:
Feature gate capabilities:
Not sure what node selection means, but we've explicitly said tolerations are out of scope.
Where did this come in? Capabilities are just added by the node, so I'm not sure what this would even mean. |
We discussed this KEP today and decided to re-consider this for 1.35 release cycle. The primary reason is to get input from Few more things discussed and that could be refined in the proposal:
|
This proposal was discussed in the SIG-Arch community meeting on June 26th (recording), It was generally seen as a beneficial strategy for managing version skew. The key takeaways and action items from the discussion are as follows:
I will incorporate this feedback into the KEP and reach out when its ready for review. |
cd6d67e
to
eff4d75
Compare
eff4d75
to
1313dd7
Compare
1313dd7
to
fc48bfb
Compare
I've updated the KEP based on the SIG Architecture feedback (#5347 (comment)). The new version focuses more on capabilities tied to the feature lifecycle and expands on the deprecation strategy. @tallclair @SergeyKanzhelev @haircommander @wojtek-t PTAL when you get a chance. |
- '@wojtek-t' | ||
- '@dom4ha' | ||
- '@macsko' | ||
approvers: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest to add somebody from sig scheduling as approver here
2. Introduce a shared library to encapsulate the logic for inferring a pod's requirements and matching them against node capabilities, ensuring consistency between control plane components that depends on capabilities. | ||
3. Enhance the kube-scheduler to filter nodes based on the pod's requirements. | ||
4. Enable API admission controllers to validate requests for operations against a node's actual feature support. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5. Enable kubelet admission plugin to check the Pod is compatible with the Node's features | |
Considered approaches: | ||
|
||
1. Have the autoscaler inspect a running node in the target node pool and assume all new nodes will be identical. This would work only if a running node exists and fails for the "scale-from-zero" conditions. | ||
2. This problem is fundamentally the same as what [kubernetes/autoscaler#7799](https://github.com/kubernetes/autoscaler/issues/7799) is tracking to support DRA use cases. The cluster-autoscaler currently does not consider DRA resources while scaling up and the long term solution would likely involve a new API surface to specify and/or modify autoscaler predictions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it the same? I thought that DRA is unique as DRA is not a part of a Node Status so it is harder to add those to the templates
**Node Capabilities Requirements:** | ||
|
||
1. Every capability must be associated with a Kubernetes feature graduating through the Alpha/Beta/GA process. This ensures capabilities are not used as permanent node attributes and are automatically removed after the feature is stable (after the supported version skew period) | ||
2. Must be derived from node's static configuration, which the Kubelet evaluates during bootstrap. Reporting new or changed capabilities requires a Kubelet restart to take effect. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also capabilities must be calculated BEFORE Pods admission. Otherwise pod admission will fail on node restart
* Graduation (GA): When the feature graduates to GA, the Kubelet continues to report the capability. This is necessary to manage version skew, allowing the control plane to correctly identify older nodes that do not yet have the GA feature. | ||
* Automated Deprecation (Post-GA): Kubelet automatically stop reporting the capability after the feature has been GA for a duration that exceeds the cluster's supported version skew. The capability check is bypassed in the shared library based on consumer component (e.g., kube-scheduler) version and feature gate graduation version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This requires clarification. Maybe with the specific versions on how to calculate supported version skews. Does this statement suggests that the capability will be removed only after GA + 3 versions? And after this, the logic is removed from both - control plane and kubelet at the same time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clarificaqtion is needed for the when it is removed from the control plane mostly
1. Replace Taints/Tolerations or Node Labels/Selectors/Affinity. | ||
2. Serve as a reporting mechanism for permanent static node attributes (like architecture, or specific hardware). | ||
3. To define the exact mapping of a feature to a capability. This KEP proposes the framework that establishes the mechanism; specific mappings will be defined with the features that use them. | ||
4. To include full Cluster Autoscaler integration in the initial Alpha stage. The autoscaler makes scaling decisions based on node templates, which lack the capability information. Defining an integration strategy is deferred as a [future enhancement](#cluster-autoscaler-integration). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will delay adoption. Perhaps it can be solved in alpha
Uh oh!
There was an error while loading. Please reload this page.