-
Notifications
You must be signed in to change notification settings - Fork 15.1k
Add v1.34 Blog Post for Resource Health Status in Pod Status for Device Plugin and DRA KEP-4680 #51556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👷 Deploy Preview for kubernetes-io-vnext-staging processing.
|
|
Welcome @Jpsassine! |
✅ Pull request preview available for checkingBuilt without sensitive environment variables
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
Hi @Jpsassine, did you mean to open this for kubernetes/enhancements#4680? |
|
@aibarbetta, after speaking with @SergeyKanzhelev we see value in having an independent blog post for this specific DRA feature as well as having it included in the main DRA blog post written by @mortent. What exactly is the deadline for this? I had made this placeholder just in case. |
|
/retitle [WIP] Add v1.34 Blog Post for Resource Health Status in Pod Status for Device Plugin and DRA KEP-4680 |
Hi @Jpsassine @SergeyKanzhelev, this PR should have front matter and blog content ready to review by Friday 8th August 2025 |
|
/sig release |
content/en/blog/_posts/2025-08-XX-pods-report-dra-resource-health.md
Outdated
Show resolved
Hide resolved
content/en/blog/_posts/2025-08-XX-pods-report-dra-resource-health.md
Outdated
Show resolved
Hide resolved
If there are general DRA blog where this can be integrated into, it will be also OK. I just worry that general DRA blog that also recommend to onboard to thus alpha feature may not be understood correctly. Making separate messages "DRA is GA" and "use new Alpha features for DRA" may be easier to digest |
7834e4e to
56a57a6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
Make sure to mark this as draft in the front matter (the PR should not be marked as draft, though). Release Comms will assign a publication date and un-draft it in a follow up PR.
content/en/blog/_posts/2025-08-XX-pods-report-dra-resource-health.md
Outdated
Show resolved
Hide resolved
content/en/blog/_posts/2025-08-XX-pods-report-dra-resource-health.md
Outdated
Show resolved
Hide resolved
content/en/blog/_posts/2025-08-XX-pods-report-dra-resource-health.md
Outdated
Show resolved
Hide resolved
content/en/blog/_posts/2025-08-XX-pods-report-dra-resource-health.md
Outdated
Show resolved
Hide resolved
content/en/blog/_posts/2025-08-XX-pods-report-dra-resource-health.md
Outdated
Show resolved
Hide resolved
content/en/blog/_posts/2025-08-XX-pods-report-dra-resource-health.md
Outdated
Show resolved
Hide resolved
content/en/blog/_posts/2025-08-XX-pods-report-dra-resource-health.md
Outdated
Show resolved
Hide resolved
content/en/blog/_posts/2025-08-XX-pods-report-dra-resource-health.md
Outdated
Show resolved
Hide resolved
content/en/blog/_posts/2025-08-XX-pods-report-dra-resource-health.md
Outdated
Show resolved
Hide resolved
content/en/blog/_posts/2025-08-XX-pods-report-dra-resource-health.md
Outdated
Show resolved
Hide resolved
content/en/blog/_posts/2025-08-XX-pods-report-dra-resource-health.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
LGTM label has been added. Git tree hash: e207d5a6ed71250c7a534de97bf0f7a7d1c378bb
|
|
@lmktfy is this still on track for release, or is there something I need to do still? Just want to make sure, thanks. |
content/en/blog/_posts/2025-08-XX-pods-report-dra-resource-health.md
Outdated
Show resolved
Hide resolved
Hi @Jpsassine yes, this is still being tracked as a v1.34 post-release communication |
This commit adds the feature blog post for the v1.34 release, covering the extension of device health monitoring to Dynamic Resource Allocation (DRA). This feature thats part of KEP-4680, allows DRA plugins to report device health directly in the Pod's status, improving observability for workloads using specialized hardware. Refs: https://github.com/kubernetes/enhancements/issues/i4680
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
with one comment
|
|
||
| The rise of AI/ML and other high-performance workloads has made specialized hardware like GPUs, TPUs, and FPGAs a critical component of many Kubernetes clusters. However, as discussed in a [previous blog post about navigating failures in Pods with devices](/blog/2025/07/03/navigating-failures-in-pods-with-devices/), when this hardware fails, it can be difficult to diagnose, leading to significant downtime. With the release of Kubernetes v1.34, we are excited to announce a new alpha feature that brings much-needed visibility into the health of these devices. | ||
|
|
||
| This work extends the functionality of [KEP-4680](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/4680-add-resource-health-to-pod-status), which first introduced a mechanism for reporting the health of devices managed by Device Plugins. Now, this capability is being extended to *Dynamic Resource Allocation (DRA)*. Controlled by the `ResourceHealthStatus` feature gate, this enhancement allows DRA drivers to report device health directly into a Pod's `.status` field, providing crucial insights for operators and developers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| This work extends the functionality of [KEP-4680](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/4680-add-resource-health-to-pod-status), which first introduced a mechanism for reporting the health of devices managed by Device Plugins. Now, this capability is being extended to *Dynamic Resource Allocation (DRA)*. Controlled by the `ResourceHealthStatus` feature gate, this enhancement allows DRA drivers to report device health directly into a Pod's `.status` field, providing crucial insights for operators and developers. | |
| This work extends the functionality of [KEP-4680](kep.k8s.io/4680), which first introduced a mechanism for reporting the health of devices managed by Device Plugins. Now, this capability is being extended to *Dynamic Resource Allocation (DRA)*. Controlled by the `ResourceHealthStatus` feature gate, this enhancement allows DRA drivers to report device health directly into a Pod's `.status` field, providing crucial insights for operators and developers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@graz-dev I think that suggestion isn't quite right
|
LGTM label has been added. Git tree hash: f05f7d88b93c63aed0bda1e79f088343060cc9a6
|
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: lmktfy, SergeyKanzhelev The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Description
Add a new blog post about the new functionality expansion of KEP-4680 which now covers DRA in 1.34. The blog post explains context and the nature of the feature for DRA.
Issue
KEP: kubernetes/enhancements#4680
Closes: #