Skip to content

Commit fb23e41

Browse files
committed
rewrite "When to update PVCs" and other updates
1 parent e2dddad commit fb23e41

File tree

1 file changed

+45
-26
lines changed
  • keps/sig-storage/4650-stateful-set-update-claim-template

1 file changed

+45
-26
lines changed

keps/sig-storage/4650-stateful-set-update-claim-template/README.md

Lines changed: 45 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -293,36 +293,52 @@ How to update PVCs:
293293

294294
2. If it is not possible to make the PVC [compatible](#what-pvc-is-compatible),
295295
do nothing. But when recreating a Pod and the corresponding PVC is deleting,
296-
wait for the deletion then create a new PVC with the current template
297-
together with the new Pod (already implemented).
296+
wait for the deletion then create a new PVC together with the new Pod (already implemented).
298297
<!--
299298
Tested on Kubernetes v1.28, and I can see this event:
300299
Warning FailedCreate 3m58s (x7 over 3m58s) statefulset-controller create Pod test-rwop-0 in StatefulSet test-rwop failed error: pvc data-test-rwop-0 is being deleted
301300
-->
302301

302+
3. Use either current or updated revision of the `volumeClaimTemplate` to create/update the PVC,
303+
just like Pod template.
304+
303305
When to update PVCs:
304-
1. Before advancing `status.updatedReplicas` to the next replica,
306+
1. If `volumeClaimSyncStrategy` is `LockStep`,
307+
before advancing `status.updatedReplicas` to the next replica,
305308
additionally check that the PVCs of the next replica are
306309
[compatible](#what-pvc-is-compatible) with the new `volumeClaimTemplate`.
307-
If not, update the PVC after old Pod deleted, before creating new pod,
308-
or if update is not possible:
309-
<!-- TODO: what if the update to PVC failed? -->
310-
- If `volumeClaimSyncStrategy` is `LockStep`,
311-
wait for the user to delete/update the old PVC manually.
312-
- If `volumeClaimSyncStrategy` is `Async`,
313-
the diff is ignored and the normal rolling update proceeds.
314-
315-
2. If Pod spec does not change, only mutable fields in `volumeClaimTemplate` differ,
316-
The PVCs should be updated just like Pods would. A replica is considered ready
317-
if all its volumes are compatible with the new `volumeClaimTemplate`.
318-
`spec.ordinals` and `spec.updateStrategy.rollingUpdate.partition` are also respected.
310+
If not, and we are not going to update it in-place automatically,
311+
wait for the user to delete/update the old PVC manually.
312+
313+
2. When doing rolling update, A replica is considered ready if the Pod is ready
314+
and all its volumes are not being updated in-place.
315+
Wait for a replica to be ready for at least `minReadySeconds` before proceeding to the next replica.
316+
317+
3. Whenever we check for Pod update, also check for PVCs update.
319318
e.g.:
320319
- If `spec.updateStrategy.type` is `RollingUpdate`,
321320
update the PVCs in the order from the largest ordinal to the smallest.
322-
Only proceed to the next ordinal when all the PVCs of the previous ordinal
323-
are compatible with the new `volumeClaimTemplate`.
324321
- If `spec.updateStrategy.type` is `OnDelete`,
325322
Only update the PVC when the Pod is deleted.
323+
324+
4. When updating the PVC in-place, if we also re-create the Pod,
325+
update the PVC after old Pod deleted, together with creating new pod.
326+
Otherwise, if pod is not changed, update the PVC only.
327+
328+
Failure cases: don't left too many PVCs being updated in-place. We expect to update the PVCs in order.
329+
330+
- If the PVC update fails, we should block the update process.
331+
If the Pod is also deleted (by controller or manually), don't block the creation of new Pod.
332+
We should retry and report events for this.
333+
The events and status should look like those when the Pod creation fails.
334+
335+
- While waiting for the PVC to reach the compatible state,
336+
We should update status, just like what we do when waiting for Pod to be ready.
337+
We should block the update process if the PVC is never compatible.
338+
339+
- If the `volumeClaimTemplate` is updated again when the previous rollout is blocked,
340+
similar to [Pods](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#forced-rollback),
341+
user may need to manually deal with the blocking PVCs (update or delete them).
326342

327343

328344
### What PVC is compatible
@@ -348,7 +364,7 @@ bogged down.
348364

349365
We're running a CI/CD system and the end-to-end automation is desired.
350366
To expand the volumes managed by a StatefulSet,
351-
we can just use the same pipeline that we are already using to updating the Pod.
367+
we can just use the same pipeline that we are already using to update the Pod.
352368
All the test, review, approval, and rollback process can be reused.
353369

354370
#### Story 2: Migrating Between Storage Providers
@@ -377,8 +393,8 @@ The same process as Story 2 can be used.
377393

378394
#### Story 5: Asymmetric Replicas
379395

380-
The replicas of our StatefulSet are not identical, so we still want to update
381-
each PVC manually and separately.
396+
The storage requirement of different replicas are not identical,
397+
so we still want to update each PVC manually and separately.
382398
Possibly we also update the `volumeClaimTemplate` for new replicas,
383399
but we don't want the controller to interfere with the existing replicas.
384400

@@ -391,23 +407,25 @@ Go in to as much detail as necessary here.
391407
This might be a good place to talk about core concepts and how they relate.
392408
-->
393409

410+
When designing the `InPlace` update strategy, we update the PVC like how we re-create the Pod.
411+
i.e. we update the PVC whenever we would re-create the Pod;
412+
we wait for the PVC to be compatible whenever we would wait for the Pod to be ready.
413+
394414
`volumeClaimSyncStrategy` is introduce to keep capability of current deployed workloads.
395415
StatefulSet currently accepts and uses existing PVCs that is not created by the controller,
396416
So the `volumeClaimTemplate` and PVC can differ even before this enhancement.
397417
Some users may choose to keep the PVCs of different replicas different.
398418
We should not block the Pod updates for them.
399419

400420
If `volumeClaimSyncStrategy` is `Async`,
401-
then if the template and PVC differs, and the PVC is not being deleted,
402-
the PVC is not considered as managed by the StatefulSet.
421+
we just ignore the PVCs that cannot be updated to be compatible with the new `volumeClaimTemplate`,
422+
as what we do currently.
423+
Of course, we report this in the status of the StatefulSet.
403424

404425
However, a workload may rely on some features provided by a specific PVC,
405426
So we should provide a way to coordinate the update.
406427
That's why we also need `LockStep`.
407428

408-
We consider a StatefulSet in stable state if all the managed PVCs are compatible with the current template.
409-
In a stable state, most operations are possible, and we are not actively fixing something.
410-
411429
The StatefulSet controller should also keeps the current and updated revision of the `volumeClaimTemplate`,
412430
so that a `LockStep` StatefulSet can still re-create Pods and PVCs that are yet-to-be-updated.
413431

@@ -994,7 +1012,8 @@ information to express the idea and why it was not acceptable.
9941012
e.g., prevent decreasing the storage size, preventing expand if the storage class does not support it.
9951013
However, this have saveral drawbacks:
9961014
* Not reverting the `volumeClaimTemplate` when rollback the StatefulSet is confusing,
997-
* This can be a barrier when recovering from a failed update.
1015+
* The validation can be a barrier when recovering from a failed update.
1016+
If RecoverVolumeExpansionFailure feature gate is enabled, we can recover from failed expansion by decreasing the size.
9981017
* The validation is racy, especially when recovering from failed expansion.
9991018
We still need to consider most abnormal cases even we do those validations.
10001019
* This does not match the pattern of existing behaviors.

0 commit comments

Comments
 (0)