KEP-5616: Cluster Autoscaler Pod Condition #5619

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

MartynaGrotek wants to merge 2 commits into kubernetes:master from MartynaGrotek:ca_pod_conditions

+883 −0

MartynaGrotek commented Oct 6, 2025

One-line PR description: Cluster Autoscaler Pod Condition

Issue link: Cluster Autoscaler pod conditions #5616

Other comments:

k8s-ci-robot added the cncf-cla: yes label

k8s-ci-robot requested review from gjtempleton and jackfrancis

October 6, 2025 13:04

k8s-ci-robot added kind/kep sig/autoscaling labels

Contributor

k8s-ci-robot commented Oct 6, 2025

Welcome @MartynaGrotek!

It looks like this is your first PR to kubernetes/enhancements 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/enhancements has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot added the needs-ok-to-test label

Contributor

k8s-ci-robot commented Oct 6, 2025

Hi @MartynaGrotek. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot added the size/XL label

MartynaGrotek mentioned this pull request

Cluster Autoscaler pod conditions #5616

Open

4 tasks

MartynaGrotek force-pushed the ca_pod_conditions branch 2 times, most recently from ba2c173 to fe05733 Compare

October 8, 2025 08:42


          KEP-5616: Cluster Autoscaler Pod Condition

7fe2b7b

MartynaGrotek force-pushed the ca_pod_conditions branch from fe05733 to 7fe2b7b Compare

October 9, 2025 08:40

Contributor

k8s-ci-robot commented Oct 9, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: MartynaGrotek
Once this PR has been reviewed and has the lgtm label, please assign johnbelamaric for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

keps/prod-readiness/OWNERS
keps/sig-autoscaling/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

jonathan-innis commented Oct 9, 2025

cc: @jmdeal @engedaam @DerekFrank

jonathan-innis reviewed

View reviewed changes

jonathan-innis left a comment

I'd personally like this KEP to be generalized to all autoscalers and provide generic guidance for when/how this condition type gets set so that current or new autoscalers outside of CAS can also take the guidance to implement the condition type

keps/sig-autoscaling/5616-ca-pod-condition/README.md


		As a user, I want to have an easy and reliable way to investigate why my pods are stuck in the Pending phase.

		#### Story 2

jonathan-innis Oct 9, 2025

I think we should consider expanding these user stories out with use-cases from scheduling. I'd imagine there is more that can be had here beyond just observability. For instance, I wonder if there are any integrations with the current Workload Scheduling work that's happening or interactions that we could consider here with nominatedNodeName to make it so that the scheduler is aware that a node is actively launching for this pod and the scheduler should wait to schedule it.

towca Oct 13, 2025

We definitely need to include #4150 as one of the scheduling use-cases.

I might not be up to date on the full scope of workload scheduling work, but when it comes to nominatedNodeName this proposal seems pretty orthogonal. It seems like both would work together well - Node autoscaler would just set nominatedNodeName as part of provisioning attempts in addition to the proposed condition.

keps/sig-autoscaling/5616-ca-pod-condition/README.md


		How will security be reviewed, and by whom?

		How will UX be reviewed, and by whom?

jonathan-innis Oct 9, 2025

Seems like we should expand out on UX review a bit -- this is particularly thorny since it has to do with pod conditions which are going to be everywhere.

keps/sig-autoscaling/5616-ca-pod-condition/README.md Outdated

+              proposal will be implemented, this is the place to discuss them.
+              -->
+              We would emit them from the same place as corresponding k8s events, which is `EventingScaleUpStatusProcessor`.

jonathan-innis Oct 9, 2025

This strikes me as pretty specific to CA -- I think we mentioned that this would be aligned across all autoscalers, not just CAS. We can probably leave the implementation details as an exercise to the implementer and just talk about the API and UX

towca Oct 13, 2025

Definitely, sorry for that.

keps/sig-autoscaling/5616-ca-pod-condition/README.md Outdated

+              ScaleUpResult                 | Pod condition type & status       | Pod condition reason
+              :---------------------------- | :-------------------------------- | :-------------------
+              ScaleUpSuccessful             | NodeProvisioningInProgress: True  |
+              ScaleUpError                  | NodeProvisioningInProgress: True  | ...Error

jonathan-innis Oct 9, 2025

In the world of CA, what are some examples of errors?

towca Oct 13, 2025

For CA these are mostly a pass-through from the underlying cloud-provider infrastructure. The most common one would be a stockout for a given instance type.

keps/sig-autoscaling/5616-ca-pod-condition/README.md Outdated

+              ScaleUpError                  | NodeProvisioningInProgress: True  | ...Error
+              ScaleUpNoOptionsAvailable     | NodeProvisioningInProgress: False | NoOptionsAvailable
+              ScaleUpNotTried               | NodeProvisioningInProgress: False | NotTried
+              ScaleUpInCooldown             | NodeProvisioningInProgress: False | InCooldown

jonathan-innis Oct 9, 2025

Cooldown isn't a concept for Karpenter -- I do think that we can be flexible on the reason that different autoscalers provide, though. I'd be more interested in the type here and the overall meaning of the condition type being in false or true

towca Oct 13, 2025

Yeah, I fully agree here - all of these are both specific to CA and implementation details that users shouldn't care about.

jonathan-innis reviewed

View reviewed changes

keps/sig-autoscaling/5616-ca-pod-condition/README.md

+              -->
+                * Provide information regarding scale up for particular pod, which can be consumed by automations and other components (eg. scheduler: https://github.com/kubernetes/enhancements/issues/3990).
+                * Improve observability and debuggability for human operators.

jonathan-innis Oct 9, 2025

I'd be curious to know the specifics that you are looking to add here -- in general, we've seen that events have been enough for most people so I'd be curious for you to explain what's been unreliable about the current mechanism and why you need this

GnatorX Oct 9, 2025 •

edited

Loading

I think (at least for me) status provides a concrete "why" vs events often only shows autoscaler attempted do X but doesn't necessarily provide a final state of the pod being stuck (for example)

DerekFrank reviewed

View reviewed changes

keps/sig-autoscaling/5616-ca-pod-condition/README.md Outdated

+              ScaleUpResult                 | Pod condition type & status       | Pod condition reason
+              :---------------------------- | :-------------------------------- | :-------------------
+              ScaleUpSuccessful             | NodeProvisioningInProgress: True  |

DerekFrank Oct 9, 2025 •

edited

Loading

In karpenter we could set the pod condition reason to the nodeclaim we launched for the pod, that might be nice observability

keps/sig-autoscaling/5616-ca-pod-condition/README.md

+                  - `EventsOnly` -  current state,
+                  - `EventsAndConditions` - produce both - we could turn it on scalability tests and assess the impact on the overall performance,
+                  - `ConditionsOnly` - produce only new pod conditions.
+                - Components depending on the feature gate: `cluster-autoscaler`.

DerekFrank Oct 9, 2025

+karpenter

keps/sig-autoscaling/5616-ca-pod-condition/README.md Outdated

+              ScaleUpSuccessful             | NodeProvisioningInProgress: True  |
+              ScaleUpError                  | NodeProvisioningInProgress: True  | ...Error
+              ScaleUpNoOptionsAvailable     | NodeProvisioningInProgress: False | NoOptionsAvailable
+              ScaleUpNotTried               | NodeProvisioningInProgress: False | NotTried

DerekFrank Oct 9, 2025

qq: In CAS whats the difference between NotTried and NoOptionsAvailable?

Author

MartynaGrotek Oct 13, 2025

NotTried - the scale up wasn't even attempted, e.g. an autoscaling iteration was skipped, or an error occurred, before the scale up logic,
NoOptionsAvailable - there were no node groups that could be considered for the scale-up

But this will be removed from this kep, because it is sth specific for CA.

towca commented Oct 13, 2025

Sorry folks, there has been a bit of internal miscommunication on our side and the proposal is definitely CA-specific as it stands right now. I synced with @MartynaGrotek offline, she should be able to remove the CA-specific parts today.

IMO the "reason" part of the condition will be quite difficult to align on, because CA and Karpenter process pending Pods in very different ways. There are probably some common reasons we could agree on and standardize, but it seems to me that we'd still want to have autoscaler-specific reasons on top of that.

At the same time, the proposal is useful even without defining the reasons. Just defining the semantics of the condition true/false values would unblock the autoscaler side of KEP-3990. And this part seems much easier to align on.

WDYT about proceeding like this:

We remove the proposed "reasons" from the proposal, and instead propose something like the specifics of the reason field of the condition are left to Node autoscaler implementations.
We focus the review on how Node autoscalers interact with the true/false value of the proposed condition.
We revisit aligning on common "reasons" when the KEP transitions to beta (we can add that as graduation criteria in the proposal).

As for the proposed true/false semantics, here's my proposal:

Node autoscalers should set NodeProvisioningInProgress=True when they process a pending Pod for provisioning and there are still provisioning options that could help the Pod.
Node autoscalers should set NodeProvisioningInProgress=False when they know that a pending Pod can't be helped by provisioning at a given point in time.
Node autoscalers aren't expected to set NodeProvisioningInProgress=False after a provisioning attempt is successful.
[optional] Node autoscalers should set NodeProvisioningInProgress=True with reason ProvisioningAttemptFailed if a provisioning attempt for a pending Pod fails. This seems like the only "reason" that could be easy to agree on quickly.

I'm not sure how well this fits the Karpenter model, but for CA it would work something like this:

CA operates on "NodeGroups" of homogeneous Nodes.
For each NodeGroup, CA can tell if provisioning a new Node from it could help a pending Pod by running kube-scheduler filters on a simulated "empty" Node.
Each NodeGroup can be fully "skipped" for all pending Pods in CA simulations in some cases - e.g. if a previous provisioning attempt failed because of a stockout, or if adding one more Node would violate max scaling limits.
When CA processes a pending Pod, it goes over all the available, non-"skipped" NodeGroups and checks if they could help the Pod.
If there is at least 1 NodeGroup that could help, CA sets NodeProvisioningInProgress=True on the Pod.
[optional] When CA attempts provisioning a new Node from a NodeGroup that could help a given pending Pod and the attempt fails, CA sets NodeProvisioningInProgress=True with reason ProvisioningAttemptFailed on it.
If for a given pending Pod there are no NodeGroups that could help, CA sets NodeProvisioningInProgress=False. This could be because none of the NodeGroups can actually schedule the Pod (e.g. a misconfigured nodeSelector), or because all of the ones that can are now "skipped" because of issues like stockouts.

towca commented Oct 13, 2025

@MartynaGrotek We've missed the PRR deadline for this, so if we want this to be included in the 1.35 release cycle we would need to file an exception following https://github.com/kubernetes/sig-release/blob/master/releases/EXCEPTIONS.md#exceptions-to-milestone-enhancement-complete-dates, as soon as possible. Reading the criteria for exception, it seems like this proposal should meet them (low risk, no dependencies, etc.). The length of the exception should be on the order of days, so we should only file one if we're reasonably confident we can get the alignment ~this week.

@jonathan-innis @DerekFrank Do you think alignment on the true/false semantics I propose in the comment above is reasonable this week?

towca mentioned this pull request

KEP-3990: PodTopologySpread DoNotSchedule-to-ScheduleAnyway fallback mode #4150

Open


          Get rid of CA-specific parts

b808320

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

towca towca left review comments

gjtempleton Awaiting requested review from gjtempleton

jackfrancis Awaiting requested review from jackfrancis

+3 more reviewers

GnatorX GnatorX left review comments

jonathan-innis jonathan-innis left review comments

DerekFrank DerekFrank left review comments

Labels

cncf-cla: yes kind/kep needs-ok-to-test sig/autoscaling size/XL