Skip to content

Conversation

@toelke
Copy link
Collaborator

@toelke toelke commented Oct 29, 2025

Add update throttling to prevent rapid deployment churn

Implements a minimum interval between updates (default: 10s, configurable) to prevent Wave from updating deployments too frequently when secrets or configmaps change rapidly.

This prevents scenarios where a buggy controller rapidly updating secrets causes Wave to rapidly update deployments, which can overwhelm the Kubernetes API server.

Key features:

  • Fixed minimum interval between updates (default: 10s)
  • Configurable via --min-update-interval flag
  • Configurable via Helm chart (minUpdateInterval value)
  • State tracked in-memory within the operator
  • Thread-safe implementation with mutex protection
  • Applies to ALL updates, even when config hashes change

Fixes #182

🤖 Generated with Claude Code

Co-Authored-By: Claude [email protected]

state.backoffLevel++
// Cap at level 6 (64s)
if state.backoffLevel > 6 {
state.backoffLevel = 6
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is redundant to the MaxBackoff check above. Either way works but both together do not make it better.

level = 6
}
backoff := MinBackoff * (1 << level) // 2^level seconds
if backoff > MaxBackoff {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either this check or the level check. Claude tried to be safe and added 3 checks ;-)

// Record successful update
h.backoffTracker.RecordUpdate(instanceName)
} else {
// No changes detected - system is stable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this is sound. If wave is triggered without a hash change (i.e. due to an annotation on the Deployment) that would reset the backoff. I guess we would have to check the backoff here as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even after reading this code and asking the LLM to add a clarifying comment, this is the part where I wanted to do the most testing :-D

I will have time to test this probably only next week.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those models do not have a notion of concurrency so dont expect them to produce race free solutions. They rarely do. I have seen too many LLM implementations with race, wrong order or dead locks ;-). Most of the times those models are not even able to fix it if you explain the issue to them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw a Mutex somewhere, but that also requires thorough review.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be good. Its isolated to one method with defered release.

The race here is really that reconciles can (and will) happen for multiple unrelated reasons. Basically what needs to be changed is that the backoff check needs to happen for both if branches. I would move it up and simply bail out (maybe with two different log messages).

This definitely needs a test. The tests cursor generated look fine but arbirary to me (focus a bit unclear). However, the tests do not test the e2e behaviour at all.

Implements a minimum interval between updates (default: 10s, configurable)
to prevent Wave from updating deployments too frequently when secrets or
configmaps change rapidly.

This prevents scenarios where a buggy controller rapidly updating secrets
causes Wave to rapidly update deployments, which can overwhelm the
Kubernetes API server.

Key features:
- Fixed minimum interval between updates (default: 10s)
- Configurable via --min-update-interval flag
- Configurable via Helm chart (minUpdateInterval value)
- State tracked in-memory within the operator
- Thread-safe implementation with mutex protection
- Applies to ALL updates, even when config hashes change

Fixes #182

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@toelke toelke force-pushed the feature/exponential-backoff-issue-182 branch from c16f915 to 3dc7cc5 Compare December 4, 2025 12:10
@toelke toelke changed the title Add exponential backoff to prevent rapid deployment updates Add backoff period to prevent rapid deployment updates Dec 4, 2025
@toelke toelke closed this Dec 4, 2025
@toelke toelke deleted the feature/exponential-backoff-issue-182 branch December 4, 2025 12:11
@toelke
Copy link
Collaborator Author

toelke commented Dec 4, 2025

I did not want to close this, I just wanted to rename the branch :-(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Wave is very fast to update Deployments and can DDoS the kubernetes API with a lot of ReplicaSets

3 participants