Skip to content

fix: Add startupProbe to prevent Superset startup problems #654

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

sbernauer
Copy link
Member

Description

Superset 4.0 and 4.1 both failed to start quick enough on a customer setup on Rhoencloud and crashlooped.
Fixed it via podOverrides, adding the probe to the operator here

Definition of Done Checklist

  • Not all of these items are applicable to all PRs, the author should update this template to only leave the boxes in that are relevant
  • Please make sure all these things are done and tick the boxes

Author

  • Changes are OpenShift compatible
  • CRD changes approved
  • CRD documentation for all fields, following the style guide.
  • Helm chart can be installed and deployed operator works
  • Integration tests passed (for non trivial changes)
  • Changes need to be "offline" compatible
  • Links to generated (nightly) docs added
  • Release note snippet added

Reviewer

  • Code contains useful comments
  • Code contains useful logging statements
  • (Integration-)Test cases added
  • Documentation added or updated. Follows the style guide.
  • Changelog updated
  • Cargo.toml only contains references to git tags (not specific commits or branches)

Acceptance

  • Feature Tracker has been updated
  • Proper release label has been added
  • Links to generated (nightly) docs added
  • Release note snippet added
  • Add type/deprecation label & add to the deprecation schedule
  • Add type/experimental label & add to the experimental features tracker

@sbernauer sbernauer self-assigned this Jul 31, 2025
@sbernauer sbernauer moved this to Development: Waiting for Review in Stackable Engineering Jul 31, 2025
@NickLarsenNZ NickLarsenNZ moved this from Development: Waiting for Review to Development: In Review in Stackable Engineering Aug 4, 2025
@NickLarsenNZ NickLarsenNZ self-requested a review August 4, 2025 07:50
Copy link
Member

@NickLarsenNZ NickLarsenNZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a changelog suggestion, and question for my own understanding.

CHANGELOG.md Outdated
Comment on lines 7 to 8
- Fix container not starting because Superset was starting too slow and was killed because a failing liveness probe.
We now add a proper startup probe, which allows Superset to take longer to start up ([#654]).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indenation

Suggested change
- Fix container not starting because Superset was starting too slow and was killed because a failing liveness probe.
We now add a proper startup probe, which allows Superset to take longer to start up ([#654]).
- Fix container not starting because Superset was starting too slow and was killed because a failing liveness probe.
We now add a proper startup probe, which allows Superset to take longer to start up ([#654]).

But I think it could be briefer:

Suggested change
- Fix container not starting because Superset was starting too slow and was killed because a failing liveness probe.
We now add a proper startup probe, which allows Superset to take longer to start up ([#654]).
- Add startup probe to give Superset longer to start up than the default ([#654]).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the current two sentences as they make a bit clearer what the problem was exactly and what the impact was.

than the default

There is no default, only before (like 60s) and after (~10 minutes) ^^

Copy link
Member

@NickLarsenNZ NickLarsenNZ Aug 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no default, only before (like 60s) and after (~10 minutes) ^^

Where was the 60 defined?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

15s initial_delay_seconds + failure_threshold (3) * period_seconds (15s)

@sbernauer sbernauer requested a review from NickLarsenNZ August 4, 2025 11:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Development: In Review
Development

Successfully merging this pull request may close these issues.

2 participants