@@ -770,6 +770,89 @@ spec:
770770 effect: NoExecute
771771` ` `
772772
773+ # ## Device Binding Conditions {#device-binding-conditions}
774+
775+ {{< feature-state feature_gate_name="DRADeviceBindingConditions" >}}
776+
777+ Device Binding Conditions allow the Kubernetes scheduler to delay Pod binding until
778+ external resources, such as fabric-attached GPUs or reprogrammable FPGAs, are confirmed
779+ to be ready.
780+
781+ This waiting behavior is implemented in the
782+ [PreBind phase](/docs/concepts/scheduling-eviction/scheduling-framework/#pre-bind)
783+ of the scheduling framework.
784+ During this phase, the scheduler checks whether all required device conditions are
785+ satisfied before proceeding with binding.
786+
787+ This improves scheduling reliability by avoiding premature binding and enables coordination
788+ with external device controllers.
789+
790+ To use this feature, device drivers (typically managed by driver owners) must publish the
791+ following fields in the `Device` section of a `ResourceSlice`. Cluster administrators
792+ must enable the `DRADeviceBindingConditions` and `DRAResourceClaimDeviceStatus` feature
793+ gates for the scheduler to honor these fields.
794+
795+ - `bindingConditions` : A list of condition types that must be set to True in the
796+ status.conditions field of the associated ResourceClaim before the Pod can be bound.
797+ These typically represent readiness signals such as "DeviceAttached" or "DeviceInitialized".
798+ - `bindingFailureConditions` : A list of condition types that, if set to True in
799+ status.conditions field of the associated ResourceClaim, indicate a failure state.
800+ If any of these conditions are True, the scheduler will abort binding and reschedule the Pod.
801+ - `bindsToNode` : if set to `true`, the scheduler records the selected node name in the
802+ ` status.allocation.nodeSelector` field of the ResourceClaim.
803+ This does not affect the Pod's `spec.nodeSelector`. Instead, it sets a node selector
804+ inside the ResourceClaim, which external controllers can use to perform node-specific
805+ operations such as device attachment or preparation.
806+
807+ All condition types listed in bindingConditions and bindingFailureConditions are evaluated
808+ from the `status.conditions` field of the ResourceClaim.
809+ External controllers are responsible for updating these conditions using standard Kubernetes
810+ condition semantics (`type`, `status`, `reason`, `message`, `lastTransitionTime`).
811+
812+ The scheduler waits up to **600 seconds** for all `bindingConditions` to become `True`.
813+ If the timeout is reached or any `bindingFailureConditions` are `True`, the scheduler
814+ clears the allocation and reschedules the Pod.
815+
816+
817+ ` ` ` yaml
818+ apiVersion: resource.k8s.io/v1
819+ kind: ResourceSlice
820+ metadata:
821+ name: gpu-slice
822+ spec:
823+ driver: dra.example.com
824+ nodeSelector:
825+ accelerator-type: high-performance
826+ pool:
827+ name: gpu-pool
828+ generation: 1
829+ resourceSliceCount: 1
830+ devices:
831+ - name: gpu-1
832+ attributes:
833+ vendor:
834+ string: "example"
835+ model:
836+ string: "example-gpu"
837+ bindsToNode: true
838+ bindingConditions:
839+ - dra.example.com/is-prepared
840+ bindingFailureConditions:
841+ - dra.example.com/preparing-failed
842+ ` ` `
843+ This example ResourceSlice has the following properties :
844+
845+ - The ResourceSlice targets nodes labeled with `accelerator-type=high-performance`,
846+ so that the scheduler uses only a specific set of eligible nodes.
847+ - The scheduler selects one node from the selected group (for example, `node-3`) and sets
848+ the `status.allocation.nodeSelector` field in the ResourceClaim to that node name.
849+ - The `dra.example.com/is-prepared` binding condition indicates that the device `gpu-1`
850+ must be prepared (the `is-prepared` condition has a status of `True`) before binding.
851+ - If the `gpu-1` device preparation fails (the `preparing-failed` condition has a status of `True`), the scheduler aborts binding.
852+ - The scheduler waits up to 600 seconds for the device to become ready.
853+ - External controllers can use the node selector in the ResourceClaim to perform
854+ node-specific setup on the selected node.
855+
773856# # {{% heading "whatsnext" %}}
774857
775858- [Set Up DRA in a Cluster](/docs/tasks/configure-pod-container/assign-resources/set-up-dra-cluster/)
0 commit comments