You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: keps/sig-storage/4650-stateful-set-update-claim-template/README.md
+45-26Lines changed: 45 additions & 26 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -293,36 +293,52 @@ How to update PVCs:
293
293
294
294
2. If it is not possible to make the PVC [compatible](#what-pvc-is-compatible),
295
295
do nothing. But when recreating a Pod and the corresponding PVC is deleting,
296
-
wait for the deletion then create a new PVC with the current template
297
-
together with the new Pod (already implemented).
296
+
wait for the deletion then create a new PVC together with the new Pod (already implemented).
298
297
<!--
299
298
Tested on Kubernetes v1.28, and I can see this event:
300
299
Warning FailedCreate 3m58s (x7 over 3m58s) statefulset-controller create Pod test-rwop-0 in StatefulSet test-rwop failed error: pvc data-test-rwop-0 is being deleted
301
300
-->
302
301
302
+
3. Use either current or updated revision of the `volumeClaimTemplate` to create/update the PVC,
303
+
just like Pod template.
304
+
303
305
When to update PVCs:
304
-
1. Before advancing `status.updatedReplicas` to the next replica,
306
+
1. If `volumeClaimSyncStrategy` is `LockStep`,
307
+
before advancing `status.updatedReplicas` to the next replica,
305
308
additionally check that the PVCs of the next replica are
306
309
[compatible](#what-pvc-is-compatible) with the new `volumeClaimTemplate`.
307
-
If not, update the PVC after old Pod deleted, before creating new pod,
308
-
or if update is not possible:
309
-
<!-- TODO: what if the update to PVC failed? -->
310
-
- If `volumeClaimSyncStrategy` is `LockStep`,
311
-
wait for the user to delete/update the old PVC manually.
312
-
- If `volumeClaimSyncStrategy` is `Async`,
313
-
the diff is ignored and the normal rolling update proceeds.
314
-
315
-
2. If Pod spec does not change, only mutable fields in `volumeClaimTemplate` differ,
316
-
The PVCs should be updated just like Pods would. A replica is considered ready
317
-
if all its volumes are compatible with the new `volumeClaimTemplate`.
318
-
`spec.ordinals` and `spec.updateStrategy.rollingUpdate.partition` are also respected.
310
+
If not, and we are not going to update it in-place automatically,
311
+
wait for the user to delete/update the old PVC manually.
312
+
313
+
2. When doing rolling update, A replica is considered ready if the Pod is ready
314
+
and all its volumes are not being updated in-place.
315
+
Wait for a replica to be ready for at least `minReadySeconds` before proceeding to the next replica.
316
+
317
+
3. Whenever we check for Pod update, also check for PVCs update.
319
318
e.g.:
320
319
- If `spec.updateStrategy.type` is `RollingUpdate`,
321
320
update the PVCs in the order from the largest ordinal to the smallest.
322
-
Only proceed to the next ordinal when all the PVCs of the previous ordinal
323
-
are compatible with the new `volumeClaimTemplate`.
324
321
- If `spec.updateStrategy.type` is `OnDelete`,
325
322
Only update the PVC when the Pod is deleted.
323
+
324
+
4. When updating the PVC in-place, if we also re-create the Pod,
325
+
update the PVC after old Pod deleted, together with creating new pod.
326
+
Otherwise, if pod is not changed, update the PVC only.
327
+
328
+
Failure cases: don't left too many PVCs being updated in-place. We expect to update the PVCs in order.
329
+
330
+
- If the PVC update fails, we should block the update process.
331
+
If the Pod is also deleted (by controller or manually), don't block the creation of new Pod.
332
+
We should retry and report events for this.
333
+
The events and status should look like those when the Pod creation fails.
334
+
335
+
- While waiting for the PVC to reach the compatible state,
336
+
We should update status, just like what we do when waiting for Pod to be ready.
337
+
We should block the update process if the PVC is never compatible.
338
+
339
+
- If the `volumeClaimTemplate` is updated again when the previous rollout is blocked,
340
+
similar to [Pods](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#forced-rollback),
341
+
user may need to manually deal with the blocking PVCs (update or delete them).
326
342
327
343
328
344
### What PVC is compatible
@@ -348,7 +364,7 @@ bogged down.
348
364
349
365
We're running a CI/CD system and the end-to-end automation is desired.
350
366
To expand the volumes managed by a StatefulSet,
351
-
we can just use the same pipeline that we are already using to updating the Pod.
367
+
we can just use the same pipeline that we are already using to update the Pod.
352
368
All the test, review, approval, and rollback process can be reused.
353
369
354
370
#### Story 2: Migrating Between Storage Providers
@@ -377,8 +393,8 @@ The same process as Story 2 can be used.
377
393
378
394
#### Story 5: Asymmetric Replicas
379
395
380
-
The replicas of our StatefulSet are not identical, so we still want to update
381
-
each PVC manually and separately.
396
+
The storage requirement of different replicas are not identical,
397
+
so we still want to update each PVC manually and separately.
382
398
Possibly we also update the `volumeClaimTemplate` for new replicas,
383
399
but we don't want the controller to interfere with the existing replicas.
384
400
@@ -391,23 +407,25 @@ Go in to as much detail as necessary here.
391
407
This might be a good place to talk about core concepts and how they relate.
392
408
-->
393
409
410
+
When designing the `InPlace` update strategy, we update the PVC like how we re-create the Pod.
411
+
i.e. we update the PVC whenever we would re-create the Pod;
412
+
we wait for the PVC to be compatible whenever we would wait for the Pod to be ready.
413
+
394
414
`volumeClaimSyncStrategy` is introduce to keep capability of current deployed workloads.
395
415
StatefulSet currently accepts and uses existing PVCs that is not created by the controller,
396
416
So the `volumeClaimTemplate` and PVC can differ even before this enhancement.
397
417
Some users may choose to keep the PVCs of different replicas different.
398
418
We should not block the Pod updates for them.
399
419
400
420
If `volumeClaimSyncStrategy` is `Async`,
401
-
then if the template and PVC differs, and the PVC is not being deleted,
402
-
the PVC is not considered as managed by the StatefulSet.
421
+
we just ignore the PVCs that cannot be updated to be compatible with the new `volumeClaimTemplate`,
422
+
as what we do currently.
423
+
Of course, we report this in the status of the StatefulSet.
403
424
404
425
However, a workload may rely on some features provided by a specific PVC,
405
426
So we should provide a way to coordinate the update.
406
427
That's why we also need `LockStep`.
407
428
408
-
We consider a StatefulSet in stable state if all the managed PVCs are compatible with the current template.
409
-
In a stable state, most operations are possible, and we are not actively fixing something.
410
-
411
429
The StatefulSet controller should also keeps the current and updated revision of the `volumeClaimTemplate`,
412
430
so that a `LockStep` StatefulSet can still re-create Pods and PVCs that are yet-to-be-updated.
413
431
@@ -994,7 +1012,8 @@ information to express the idea and why it was not acceptable.
994
1012
e.g., prevent decreasing the storage size, preventing expand if the storage class does not support it.
995
1013
However, this have saveral drawbacks:
996
1014
* Not reverting the `volumeClaimTemplate` when rollback the StatefulSet is confusing,
997
-
* This can be a barrier when recovering from a failed update.
1015
+
* The validation can be a barrier when recovering from a failed update.
1016
+
If RecoverVolumeExpansionFailure feature gate is enabled, we can recover from failed expansion by decreasing the size.
998
1017
* The validation is racy, especially when recovering from failed expansion.
999
1018
We still need to consider most abnormal cases even we do those validations.
1000
1019
* This does not match the pattern of existing behaviors.
0 commit comments