Introduce `RetryOnFailure` lifecycle management strategy

This is a proposal for introducing a new way of performing lifecycle management akin to Flux kustomize-controller operational mode.

The `RetryOnFailure` strategy is suitable for statefulsets and other workloads that cannot tolerate rollbacks and have a high rollout duration susceptible to health check timeouts and transient capacity errors.

The `RetryOnFailure` strategy will ensure that:

- An installation failure will not be retried immediately, instead, the controller will retry with an upgrade at a fixed interval defined in the HelmRelease.
- An upgrade failure will leave the Helm release in a failed state without performing any remediations.
- An upgrade failure will never result in a rollback or uninstall.
- Upgrade failures are always retried at a fixed interval defined in the HelmRelease.

## API Changes

```yaml
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
spec:
  install:
    strategy:
      name: RetryOnFailure # defaults to RemediateOnFailure
      retryInterval: 5m # only used for RetryOnFailure (defaults to 5m)
    remediation: # ignored for RetryOnFailure
      retries: 2
  upgrade:
    strategy:
      name: RetryOnFailure # defaults to RemediateOnFailure
      retryInterval: 5m # only used for RetryOnFailure (defaults to 5m)
    remediation: # ignored for RetryOnFailure
      retries: 2
      strategy: rollback
```

When not specified, or when the strategy is set to `RemediateOnFailure`, the lifecycle management works like before.

For installations, the `RetryOnFailure` strategy will perform an uninstall on failure, then will rerun the installation after the specified retry interval.

For upgrades, the `RetryOnFailure` strategy will behave like `flux reconcile hr --force` when the remediation retries are set to 0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Introduce `RetryOnFailure` lifecycle management strategy #1278

API Changes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Introduce RetryOnFailure lifecycle management strategy #1278

Description

API Changes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Introduce `RetryOnFailure` lifecycle management strategy #1278