-
Notifications
You must be signed in to change notification settings - Fork 1.6k
KEP-5616: Cluster Autoscaler Pod Condition #5619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
MartynaGrotek
commented
Oct 6, 2025
- One-line PR description: Cluster Autoscaler Pod Condition
- Issue link: Cluster Autoscaler pod conditions #5616
- Other comments:
Welcome @MartynaGrotek! |
Hi @MartynaGrotek. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
ba2c173
to
fe05733
Compare
fe05733
to
7fe2b7b
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: MartynaGrotek The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd personally like this KEP to be generalized to all autoscalers and provide generic guidance for when/how this condition type gets set so that current or new autoscalers outside of CAS can also take the guidance to implement the condition type
|
||
As a user, I want to have an easy and reliable way to investigate why my pods are stuck in the Pending phase. | ||
|
||
#### Story 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should consider expanding these user stories out with use-cases from scheduling. I'd imagine there is more that can be had here beyond just observability. For instance, I wonder if there are any integrations with the current Workload Scheduling work that's happening or interactions that we could consider here with nominatedNodeName
to make it so that the scheduler is aware that a node is actively launching for this pod and the scheduler should wait to schedule it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We definitely need to include #4150 as one of the scheduling use-cases.
I might not be up to date on the full scope of workload scheduling work, but when it comes to nominatedNodeName
this proposal seems pretty orthogonal. It seems like both would work together well - Node autoscaler would just set nominatedNodeName
as part of provisioning attempts in addition to the proposed condition.
|
||
How will security be reviewed, and by whom? | ||
|
||
How will UX be reviewed, and by whom? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we should expand out on UX review a bit -- this is particularly thorny since it has to do with pod conditions which are going to be everywhere.
proposal will be implemented, this is the place to discuss them. | ||
--> | ||
|
||
We would emit them from the same place as corresponding k8s events, which is `EventingScaleUpStatusProcessor`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This strikes me as pretty specific to CA -- I think we mentioned that this would be aligned across all autoscalers, not just CAS. We can probably leave the implementation details as an exercise to the implementer and just talk about the API and UX
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely, sorry for that.
ScaleUpResult | Pod condition type & status | Pod condition reason | ||
:---------------------------- | :-------------------------------- | :------------------- | ||
ScaleUpSuccessful | NodeProvisioningInProgress: True | | ||
ScaleUpError | NodeProvisioningInProgress: True | ...Error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the world of CA, what are some examples of errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For CA these are mostly a pass-through from the underlying cloud-provider infrastructure. The most common one would be a stockout for a given instance type.
ScaleUpError | NodeProvisioningInProgress: True | ...Error | ||
ScaleUpNoOptionsAvailable | NodeProvisioningInProgress: False | NoOptionsAvailable | ||
ScaleUpNotTried | NodeProvisioningInProgress: False | NotTried | ||
ScaleUpInCooldown | NodeProvisioningInProgress: False | InCooldown |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cooldown isn't a concept for Karpenter -- I do think that we can be flexible on the reason that different autoscalers provide, though. I'd be more interested in the type here and the overall meaning of the condition type being in false or true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I fully agree here - all of these are both specific to CA and implementation details that users shouldn't care about.
--> | ||
|
||
* Provide information regarding scale up for particular pod, which can be consumed by automations and other components (eg. scheduler: https://github.com/kubernetes/enhancements/issues/3990). | ||
* Improve observability and debuggability for human operators. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be curious to know the specifics that you are looking to add here -- in general, we've seen that events have been enough for most people so I'd be curious for you to explain what's been unreliable about the current mechanism and why you need this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think (at least for me) status provides a concrete "why" vs events often only shows autoscaler attempted do X but doesn't necessarily provide a final state of the pod being stuck (for example)
|
||
ScaleUpResult | Pod condition type & status | Pod condition reason | ||
:---------------------------- | :-------------------------------- | :------------------- | ||
ScaleUpSuccessful | NodeProvisioningInProgress: True | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In karpenter we could set the pod condition reason to the nodeclaim we launched for the pod, that might be nice observability
- `EventsOnly` - current state, | ||
- `EventsAndConditions` - produce both - we could turn it on scalability tests and assess the impact on the overall performance, | ||
- `ConditionsOnly` - produce only new pod conditions. | ||
- Components depending on the feature gate: `cluster-autoscaler`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+karpenter
ScaleUpSuccessful | NodeProvisioningInProgress: True | | ||
ScaleUpError | NodeProvisioningInProgress: True | ...Error | ||
ScaleUpNoOptionsAvailable | NodeProvisioningInProgress: False | NoOptionsAvailable | ||
ScaleUpNotTried | NodeProvisioningInProgress: False | NotTried |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qq: In CAS whats the difference between NotTried and NoOptionsAvailable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- NotTried - the scale up wasn't even attempted, e.g. an autoscaling iteration was skipped, or an error occurred, before the scale up logic,
- NoOptionsAvailable - there were no node groups that could be considered for the scale-up
But this will be removed from this kep, because it is sth specific for CA.
Sorry folks, there has been a bit of internal miscommunication on our side and the proposal is definitely CA-specific as it stands right now. I synced with @MartynaGrotek offline, she should be able to remove the CA-specific parts today. IMO the "reason" part of the condition will be quite difficult to align on, because CA and Karpenter process pending Pods in very different ways. There are probably some common reasons we could agree on and standardize, but it seems to me that we'd still want to have autoscaler-specific reasons on top of that. At the same time, the proposal is useful even without defining the reasons. Just defining the semantics of the condition true/false values would unblock the autoscaler side of KEP-3990. And this part seems much easier to align on. WDYT about proceeding like this:
As for the proposed true/false semantics, here's my proposal:
I'm not sure how well this fits the Karpenter model, but for CA it would work something like this:
|
@MartynaGrotek We've missed the PRR deadline for this, so if we want this to be included in the 1.35 release cycle we would need to file an exception following https://github.com/kubernetes/sig-release/blob/master/releases/EXCEPTIONS.md#exceptions-to-milestone-enhancement-complete-dates, as soon as possible. Reading the criteria for exception, it seems like this proposal should meet them (low risk, no dependencies, etc.). The length of the exception should be on the order of days, so we should only file one if we're reasonably confident we can get the alignment ~this week. @jonathan-innis @DerekFrank Do you think alignment on the true/false semantics I propose in the comment above is reasonable this week? |