Skip to content

Commit 838aed3

Browse files
committed
DRA: core update for 1.34
The feature gate and API examples get updated. Enabling it is now simpler, changes are only needed for backward compatibility. One particular troubleshooting step fits into the existing user-facing "allocate-devices-dra.md". Admin-facing troubleshooting and documentation of metrics which might be of interest can follow separately.
1 parent ab5c2db commit 838aed3

File tree

7 files changed

+74
-67
lines changed

7 files changed

+74
-67
lines changed

content/en/docs/concepts/scheduling-eviction/dynamic-resource-allocation.md

Lines changed: 12 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -219,7 +219,7 @@ creating or modifying ResourceSlices.
219219
Consider the following example ResourceSlice:
220220

221221
```yaml
222-
apiVersion: resource.k8s.io/v1beta1
222+
apiVersion: resource.k8s.io/v1
223223
kind: ResourceSlice
224224
metadata:
225225
name: cat-slice
@@ -233,14 +233,13 @@ spec:
233233
allNodes: true
234234
devices:
235235
- name: "large-black-cat"
236-
basic:
237-
attributes:
238-
color:
239-
string: "black"
240-
size:
241-
string: "large"
242-
cat:
243-
boolean: true
236+
attributes:
237+
color:
238+
string: "black"
239+
size:
240+
string: "large"
241+
cat:
242+
boolean: true
244243
```
245244
This ResourceSlice is managed by the `resource-driver.example.com` driver in the
246245
`black-cat-pool` pool. The `allNodes: true` field indicates that any node in the
@@ -399,7 +398,7 @@ admin access grants access to in-use devices and may enable additional
399398
permissions when making the device available in a container:
400399

401400
```yaml
402-
apiVersion: resource.k8s.io/v1beta2
401+
apiVersion: resource.k8s.io/v1
403402
kind: ResourceClaimTemplate
404403
metadata:
405404
name: large-black-cat-claim-template
@@ -441,7 +440,7 @@ allocated if it is available. But if it is not and two small white devices are a
441440
the pod will still be able to run.
442441

443442
```yaml
444-
apiVersion: resource.k8s.io/v1beta2
443+
apiVersion: resource.k8s.io/v1
445444
kind: ResourceClaimTemplate
446445
metadata:
447446
name: prioritized-list-claim-template
@@ -495,7 +494,7 @@ handles this and it is transparent to the consumer as the ResourceClaim API is n
495494

496495
```yaml
497496
kind: ResourceSlice
498-
apiVersion: resource.k8s.io/v1beta2
497+
apiVersion: resource.k8s.io/v1
499498
metadata:
500499
name: resourceslice
501500
spec:
@@ -632,4 +631,4 @@ spec:
632631
- [Allocate devices to workloads using DRA](/docs/tasks/configure-pod-container/assign-resources/allocate-devices-dra/)
633632
- For more information on the design, see the
634633
[Dynamic Resource Allocation with Structured Parameters](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/4381-dra-structured-parameters)
635-
KEP.
634+
KEP.

content/en/docs/reference/command-line-tools-reference/feature-gates/DynamicResourceAllocation.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,13 @@ stages:
1313
- stage: beta
1414
defaultValue: false
1515
fromVersion: "1.32"
16+
toVersion: "1.33"
17+
- stage: stable
18+
defaultValue: true
19+
locked: false
20+
fromVersion: "1.34"
1621

17-
# TODO: as soon as this is locked to "true" (= GA), comments about other DRA
22+
# TODO: as soon as this is locked to "true" (= some time after GA, *not* yet in 1.34), comments about other DRA
1823
# feature gate(s) like "unless you also enable the `DynamicResourceAllocation` feature gate"
1924
# can be removed (for example, in dra-admin-access.md).
2025

content/en/docs/tasks/configure-pod-container/assign-resources/allocate-devices-dra.md

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: Allocate Devices to Workloads with DRA
33
content_type: task
4-
min-kubernetes-server-version: v1.32
4+
min-kubernetes-server-version: v1.34
55
weight: 20
66
---
77
{{< feature-state feature_gate_name="DynamicResourceAllocation" >}}
@@ -157,6 +157,20 @@ claims in different containers.
157157
kubectl apply -f https://k8s.io/examples/dra/dra-example-job.yaml
158158
```
159159

160+
Try the following troubleshooting steps:
161+
162+
1. When the workload does not start as expected, drill down from Job
163+
to Pods to ResourceClaims and check the objects
164+
at each level with `kubectl describe` to see whether there are any
165+
status fields or events which might explain why the workload is
166+
not starting.
167+
1. When creating a Pod fails with `must specify one of: resourceClaimName,
168+
resourceClaimTemplateName`, check that all entries in `pod.spec.resourceClaims`
169+
have exactly one of those fields set. If they do, then it is possible
170+
that the cluster has a mutating Pod webhook installed which was built
171+
against APIs from Kubernetes < 1.32. Work with your cluster administrator
172+
to check this.
173+
160174
## Clean up {#clean-up}
161175

162176
To delete the Kubernetes objects that you created in this task, follow these
@@ -183,4 +197,4 @@ steps:
183197

184198
## {{% heading "whatsnext" %}}
185199

186-
* [Learn more about DRA](/docs/concepts/scheduling-eviction/dynamic-resource-allocation)
200+
* [Learn more about DRA](/docs/concepts/scheduling-eviction/dynamic-resource-allocation)

content/en/docs/tasks/configure-pod-container/assign-resources/set-up-dra-cluster.md

Lines changed: 37 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: "Set Up DRA in a Cluster"
33
content_type: task
4-
min-kubernetes-server-version: v1.32
4+
min-kubernetes-server-version: v1.34
55
weight: 10
66
---
77
{{< feature-state feature_gate_name="DynamicResourceAllocation" >}}
@@ -37,30 +37,20 @@ For details, see
3737

3838
<!-- steps -->
3939

40-
## Enable the DRA API groups {#enable-dra}
40+
## Optional: enable additional DRA API groups {#enable-dra}
4141

42-
To let Kubernetes allocate resources to your Pods with DRA, complete the
43-
following configuration steps:
42+
DRA reached GA in Kubernetes 1.34 and is enabled by default.
43+
Some older DRA drivers or workloads might still need the
44+
v1beta1 API from Kubernetes 1.30 or v1beta2 from Kubernetes 1.32.
45+
If and only if support for those is desired, then enable the following
46+
{{< glossary_tooltip text="API groups" term_id="api-group" >}}:
47+
48+
* `resource.k8s.io/v1beta1`
49+
* `resource.k8s.io/v1beta2`
50+
51+
For more information, see
52+
[Enabling or disabling API groups](/docs/reference/using-api/#enabling-or-disabling).
4453

45-
1. Enable the `DynamicResourceAllocation`
46-
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
47-
on all of the following components:
48-
49-
* `kube-apiserver`
50-
* `kube-controller-manager`
51-
* `kube-scheduler`
52-
* `kubelet`
53-
54-
1. Enable the following
55-
{{< glossary_tooltip text="API groups" term_id="api-group" >}}:
56-
57-
* `resource.k8s.io/v1beta1`: required for DRA to function.
58-
* `resource.k8s.io/v1beta2`: optional, recommended improvements to the user
59-
experience.
60-
61-
For more information, see
62-
[Enabling or disabling API groups](/docs/reference/using-api/#enabling-or-disabling).
63-
6454
## Verify that DRA is enabled {#verify}
6555

6656
To verify that the cluster is configured correctly, try to list DeviceClasses:
@@ -81,15 +71,10 @@ similar to the following:
8171
```
8272
error: the server doesn't have a resource type "deviceclasses"
8373
```
74+
8475
Try the following troubleshooting steps:
8576

86-
1. Ensure that the `kube-scheduler` component has the `DynamicResourceAllocation`
87-
feature gate enabled *and* uses the
88-
[v1 configuration API](/docs/reference/config-api/kube-scheduler-config.v1/).
89-
If you use a custom configuration, you might need to perform additional steps
90-
to enable the `DynamicResource` plugin.
91-
1. Restart the `kube-apiserver` component and the `kube-controller-manager`
92-
component to propagate the API group changes.
77+
1. Reconfigure and restart the `kube-apiserver` component.
9378

9479
## Install device drivers {#install-drivers}
9580

@@ -112,6 +97,12 @@ cluster-1-device-pool-1-driver.example.com-lqx8x cluster-1-node-1 driver
11297
cluster-1-device-pool-2-driver.example.com-29t7b cluster-1-node-2 driver.example.com cluster-1-device-pool-2-446z 8s
11398
```
11499

100+
Try the following troubleshooting steps:
101+
102+
1. Check the health of the DRA driver and look for error messages about
103+
publishing ResourceSlices in its log output. The vendor of the driver
104+
may have further instructions about installation and troubleshooting.
105+
115106
## Create DeviceClasses {#create-deviceclasses}
116107

117108
You can define categories of devices that your application operators can
@@ -135,27 +126,25 @@ operators.
135126
The output is similar to the following:
136127

137128
```yaml
138-
apiVersion: resource.k8s.io/v1beta1
129+
apiVersion: resource.k8s.io/v1
139130
kind: ResourceSlice
140131
# lines omitted for clarity
141132
spec:
142133
devices:
143-
- basic:
144-
attributes:
145-
type:
146-
string: gpu
147-
capacity:
148-
memory:
149-
value: 64Gi
150-
name: gpu-0
151-
- basic:
152-
attributes:
153-
type:
154-
string: gpu
155-
capacity:
156-
memory:
157-
value: 64Gi
158-
name: gpu-1
134+
- attributes:
135+
type:
136+
string: gpu
137+
capacity:
138+
memory:
139+
value: 64Gi
140+
name: gpu-0
141+
- attributes:
142+
type:
143+
string: gpu
144+
capacity:
145+
memory:
146+
value: 64Gi
147+
name: gpu-1
159148
driver: driver.example.com
160149
nodeName: cluster-1-node-1
161150
# lines omitted for clarity
@@ -186,4 +175,4 @@ kubectl delete -f https://k8s.io/examples/dra/deviceclass.yaml
186175
## {{% heading "whatsnext" %}}
187176

188177
* [Learn more about DRA](/docs/concepts/scheduling-eviction/dynamic-resource-allocation)
189-
* [Allocate Devices to Workloads with DRA](/docs/tasks/configure-pod-container/assign-resources/allocate-devices-dra)
178+
* [Allocate Devices to Workloads with DRA](/docs/tasks/configure-pod-container/assign-resources/allocate-devices-dra)

content/en/examples/dra/deviceclass.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
apiVersion: resource.k8s.io/v1beta2
1+
apiVersion: resource.k8s.io/v1
22
kind: DeviceClass
33
metadata:
44
name: example-device-class

content/en/examples/dra/resourceclaim.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
apiVersion: resource.k8s.io/v1beta2
1+
apiVersion: resource.k8s.io/v1
22
kind: ResourceClaim
33
metadata:
44
name: example-resource-claim

content/en/examples/dra/resourceclaimtemplate.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
apiVersion: resource.k8s.io/v1beta2
1+
apiVersion: resource.k8s.io/v1
22
kind: ResourceClaimTemplate
33
metadata:
44
name: example-resource-claim-template

0 commit comments

Comments
 (0)