Skip to content

Commit c41eebb

Browse files
authored
Merge pull request #49814 from KobayashiD27/dev-1.33-dra-device-binding-conditions
DRA Device Binding Conditions #5007 document
2 parents 59d333c + 4ddbd18 commit c41eebb

File tree

2 files changed

+97
-0
lines changed

2 files changed

+97
-0
lines changed

content/en/docs/concepts/scheduling-eviction/dynamic-resource-allocation.md

Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -770,6 +770,89 @@ spec:
770770
effect: NoExecute
771771
```
772772

773+
### Device Binding Conditions {#device-binding-conditions}
774+
775+
{{< feature-state feature_gate_name="DRADeviceBindingConditions" >}}
776+
777+
Device Binding Conditions allow the Kubernetes scheduler to delay Pod binding until
778+
external resources, such as fabric-attached GPUs or reprogrammable FPGAs, are confirmed
779+
to be ready.
780+
781+
This waiting behavior is implemented in the
782+
[PreBind phase](/docs/concepts/scheduling-eviction/scheduling-framework/#pre-bind)
783+
of the scheduling framework.
784+
During this phase, the scheduler checks whether all required device conditions are
785+
satisfied before proceeding with binding.
786+
787+
This improves scheduling reliability by avoiding premature binding and enables coordination
788+
with external device controllers.
789+
790+
To use this feature, device drivers (typically managed by driver owners) must publish the
791+
following fields in the `Device` section of a `ResourceSlice`. Cluster administrators
792+
must enable the `DRADeviceBindingConditions` and `DRAResourceClaimDeviceStatus` feature
793+
gates for the scheduler to honor these fields.
794+
795+
- `bindingConditions`: A list of condition types that must be set to True in the
796+
status.conditions field of the associated ResourceClaim before the Pod can be bound.
797+
These typically represent readiness signals such as "DeviceAttached" or "DeviceInitialized".
798+
- `bindingFailureConditions`: A list of condition types that, if set to True in
799+
status.conditions field of the associated ResourceClaim, indicate a failure state.
800+
If any of these conditions are True, the scheduler will abort binding and reschedule the Pod.
801+
- `bindsToNode`: if set to `true`, the scheduler records the selected node name in the
802+
`status.allocation.nodeSelector` field of the ResourceClaim.
803+
This does not affect the Pod's `spec.nodeSelector`. Instead, it sets a node selector
804+
inside the ResourceClaim, which external controllers can use to perform node-specific
805+
operations such as device attachment or preparation.
806+
807+
All condition types listed in bindingConditions and bindingFailureConditions are evaluated
808+
from the `status.conditions` field of the ResourceClaim.
809+
External controllers are responsible for updating these conditions using standard Kubernetes
810+
condition semantics (`type`, `status`, `reason`, `message`, `lastTransitionTime`).
811+
812+
The scheduler waits up to **600 seconds** for all `bindingConditions` to become `True`.
813+
If the timeout is reached or any `bindingFailureConditions` are `True`, the scheduler
814+
clears the allocation and reschedules the Pod.
815+
816+
817+
```yaml
818+
apiVersion: resource.k8s.io/v1
819+
kind: ResourceSlice
820+
metadata:
821+
name: gpu-slice
822+
spec:
823+
driver: dra.example.com
824+
nodeSelector:
825+
accelerator-type: high-performance
826+
pool:
827+
name: gpu-pool
828+
generation: 1
829+
resourceSliceCount: 1
830+
devices:
831+
- name: gpu-1
832+
attributes:
833+
vendor:
834+
string: "example"
835+
model:
836+
string: "example-gpu"
837+
bindsToNode: true
838+
bindingConditions:
839+
- dra.example.com/is-prepared
840+
bindingFailureConditions:
841+
- dra.example.com/preparing-failed
842+
```
843+
This example ResourceSlice has the following properties:
844+
845+
- The ResourceSlice targets nodes labeled with `accelerator-type=high-performance`,
846+
so that the scheduler uses only a specific set of eligible nodes.
847+
- The scheduler selects one node from the selected group (for example, `node-3`) and sets
848+
the `status.allocation.nodeSelector` field in the ResourceClaim to that node name.
849+
- The `dra.example.com/is-prepared` binding condition indicates that the device `gpu-1`
850+
must be prepared (the `is-prepared` condition has a status of `True`) before binding.
851+
- If the `gpu-1` device preparation fails (the `preparing-failed` condition has a status of `True`), the scheduler aborts binding.
852+
- The scheduler waits up to 600 seconds for the device to become ready.
853+
- External controllers can use the node selector in the ResourceClaim to perform
854+
node-specific setup on the selected node.
855+
773856
## {{% heading "whatsnext" %}}
774857

775858
- [Set Up DRA in a Cluster](/docs/tasks/configure-pod-container/assign-resources/set-up-dra-cluster/)
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
---
2+
title: DRADeviceBindingConditions
3+
content_type: feature_gate
4+
_build:
5+
list: never
6+
render: false
7+
8+
stages:
9+
- stage: alpha
10+
defaultValue: false
11+
fromVersion: "1.34"
12+
---
13+
Enables support for DeviceBindingConditions in the DRA related fields.
14+
This allows for thorough device readiness checks and attachment processes before Bind phase.

0 commit comments

Comments
 (0)