@@ -348,24 +348,24 @@ list, it will not add a reference to the pod.
348348
349349##### Deallocation
350350The resourceclaim controller will remove Pod references from the ` ReservedFor ` list just
351- like it does now using the same logic. For non-Pod references, the controller will recognize
352- a small number of built-in types, starting with ` Deployment ` , ` StatefulSet ` and ` Job ` , and will
353- remove the reference from the list when those resources are removed. For other types,
354- it will be the responsibility of the workload controller/user that created the ` ResourceClaim `
355- to remove the reference to the non-Pod resource from the ` ReservedFor ` list when no pods
351+ like it does now using the same logic. But for non-Pod references, it
352+ will be the responsibility of the controller/user that created the ` ResourceClaim ` to
353+ remove the reference to the non-Pod resource from the ` ReservedFor ` list when no pods
356354are consuming the ` ResourceClaim ` and no new pods will be created that references
357355the ` ResourceClaim ` .
358356
359357The resourceclaim controller will then discover that the ` ReservedFor ` list is empty
360358and therefore know that it is safe to deallocate the ` ResourceClaim ` .
361359
362- This requires that the resourceclaim controller watches the workload types that will
363- be supported. For other types of workloads, there will be a requirement that the workload
364- controller has permissions to update the status subresource of the ` ResourceClaim ` . The
365- resourceclaim controller will also try to detect if an unknown resource referenced in the
366- ` ReservedFor ` list has been deleted from the cluster, but that requires that the controller
367- has permissions to get or list resources of the type. If the resourceclaim controller is
368- not able to check, it will just wait until the reference in the ` ReservedFor ` list is removed.
360+ This requires that the controller/user has permissions to update the status
361+ subresource of the ` ResourceClaim ` . The resourceclaim controller will also try to detect if
362+ the resource referenced in the ` ReservedFor ` list has been deleted from the cluster, but
363+ that requires that the controller has permissions to get or list resources of the type. If the
364+ resourceclaim controller is not able to check, it will just wait until the reference in
365+ the ` ReservedFor ` list is removed. The resourceclaim controller will not have a watch
366+ on the worikload resource, so there is no guarantee that the controller will realize that
367+ the resource has been deleted. This is an extra check since it is the responsibility of
368+ the workload controller to update the claim.
369369
370370##### Finding pods using a ResourceClaim
371371If the reference in the ` ReservedFor ` list is to a non-Pod resource, controllers can no longer
@@ -473,6 +473,22 @@ The API server will no longer accept the new fields and the other components wil
473473not know what to do with them. So the result is that the ` ReservedFor ` list will only
474474have references to pod resources like today.
475475
476+ Any ResourceClaims that have already been allocated when the feature was active will
477+ have non-pod references in the ` ReservedFor ` list after a downgrade, but the controllers
478+ will not know how to handle it. There are two problems that will arise as a result of
479+ this:
480+ - The workload controller will also have been downgraded if it is in-tree, meaning that
481+ it will not remove the reference to workload resource from the ` ReservedFor ` list, thus
482+ leading to a situation where the claim will never be deallocated.
483+ - For new pods that gets scheduled, the scheduler will add pod references in the
484+ ` ReservedFor ` list, despite there being a non-pod reference here. So it ends up with
485+ both pod and non-pod references, which breaks the assumptions of the design. We need
486+ to make sure the system can handle this, as it might also happen as a result of
487+ disablement and the enablement of the feature.
488+
489+ We can address this by adding the logic for the non-pod references in 1.34 and then
490+ add the actual feature for 1.35.
491+
476492### Version Skew Strategy
477493
478494If the kubelet is on a version that doesn't support the feature but the rest of the
@@ -482,10 +498,9 @@ since it will still check whether the `Pod` is references in the `ReservedFor` l
482498If the API server is on a version that supports the feature, but the scheduler
483499is not, the scheduler will not know about the new fields added, so it will
484500put the reference to the ` Pod ` in the ` ReservedFor ` list rather than the reference
485- in the ` spec.ReservedFor ` list. As a result, the workload will get scheduled, but
486- it will be subject to the 256 limit on the size of the ` ReservedFor ` list and the
487- controller creating the ` ResourceClaim ` will not find the reference it expects
488- in the ` ReservedFor ` list when it tries to remove it.
501+ in the ` spec.ReservedFor ` list. It will do this even if there is already a non-pod
502+ reference in the ` spec.ReservedFor ` list. This leads to the challenge described
503+ in the previous section.
489504
490505## Production Readiness Review Questionnaire
491506
543558
544559###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
545560
546- Applications that were already running will continue to run and the allocated
547- devices will remain so.
548- For the resource types supported directly, the resource claim controller will not remove the
549- reference in the ` ReservedFor ` list, meaning the devices will not be deallocated. If the workload
550- controller is responsible for removing the reference, deallocation will work as long as the
551- feature isn't also disabled in the controllers. If they are, deallocation will not happen in this
552- situation either .
561+ Applications that were already running will continue to run. But if a pod have to be
562+ re-admitted by a kubelet where the feature has been disabled, it will not be able to, since
563+ the kubelet will not find a reference to the pod in the ` ReservedFor ` list.
564+
565+ The feature will also be disabled for in-tree workload controllers, meaning that they will
566+ not remove the reference to the pod from the ` ReservedFor ` list. This means the list will never
567+ be empty and the resourceclaim controller will never deallocate the claim .
553568
554569###### What happens if we reenable the feature if it was previously rolled back?
555570
556571It will take affect again and will impact how the ` ReservedFor ` field is used during allocation
557- and deallocation.
572+ and deallocation. Since this scenario allows a ResourceClaim with the ` spec.ReservedFor ` field
573+ to be set and then have the scheduler populate the ` ReservedFor ` list with pods when the feature
574+ is disabled, we will end up in a situation where the ` ReservedFor ` list can contain both non-pod
575+ and pod references. We need to make sure all components can handle that.
558576
559577###### Are there any tests for feature enablement/disablement?
560578
723741
724742###### Will enabling / using this feature result in increasing size or count of the existing API objects?
725743
726- No
744+ Yes and no. We are adding two new fields to the ResourceClaim type, but neither are of a collection type
745+ so they should have limited impact on the total size of the objects. However, this feature means that
746+ we no longer need to keep a complete list of all pods using a ResourceClaim, which can significantly
747+ reduce the size of ResourceClaim objects shared by many pods.
727748
728749###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
729750
0 commit comments