Skip to content

Deploy ToolHive Operator into OpenShift (#1063) #1253

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Aug 15, 2025

Conversation

RoddieKieley
Copy link
Contributor

* Update helm chart resources and seccompProfile type values for OKD environment.
* Update MCPServer Deployment with check for XDG_CONFIG_HOME and HOME env vars being set.
* Add OpenShift environment detection by way of route v1 API availability or OPENSHIFT_OPERATOR env var override.

@RoddieKieley
Copy link
Contributor Author

When running go-task operator-test I observed the following problem first:

"shell": executable file not found in $PATH

As well as a number of test failures that didn't seem all associated with the changes in this PR. I passed the terminal output through cursor -> o3 with a request to write the content out to markdown which I'm attaching:
1063-cursor-o3-feedback.md

I utilized the 'Option B' Fix for the "shell" issue above and that worked locally for me on Fedora 42. Not sure how other environment friendly that is however so not including it in the PR itself.

Either way feedback welcome and would be curious if you've had any luck with OKD @jhrozek ?

@ChrisJBurns
Copy link
Collaborator

For awareness, I will be deploying OpenShift into our environment next week for @jhrozek to play around with ToolHive installations

@dmartinol
Copy link

Hey @ChrisJBurns could you also suggest a default strategy for the XDG_CONFIG_HOME env vars? ATM we're hardcoding /tmp which is probably not ideal, WDYT?

Copy link
Contributor

@jhrozek jhrozek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the patch @RoddieKieley ! We'll work with @ChrisJBurns on setting up the OKD cluster so we can test properly.

@jhrozek
Copy link
Contributor

jhrozek commented Aug 4, 2025

@RoddieKieley good news! Thanks to @ChrisJBurns we have an OKD cluster now 🚀

Copy link
Collaborator

@ChrisJBurns ChrisJBurns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I few more comments from me. I'm wondering if we instead do the OpenShift checks in the Operator and then pass in the desired securityContext's to the ProxyRunner? This avoids the need for the isOpenShift flags at the runtime layer when it comes to the podTemplateSpec and also allows us to remove some of the securityContext code from the runtime layer.

    * Update helm chart resources and seccompProfile type values for OKD environment.
    * Update MCPServer Deployment with check for XDG_CONFIG_HOME and HOME env vars being set.
    * Add PlatformDetector interface with detection implementation for Kubernetes and OpenShift.
    * Update behaviour of tests now requiring in cluster to skip locally.

Signed-off-by: Roddie Kieley <[email protected]>
Co-authored-by: Cursor o3 <[email protected]>
Co-authored-by: Cursor claude-4 <[email protected]>
@RoddieKieley
Copy link
Contributor Author

RoddieKieley commented Aug 12, 2025

@jhrozek @ChrisJBurns I've squashed and force pushed my latest changes to my issue1063 branch which is the basis for this PR. The latest changes address a number of outstanding issues, however I do see some questions still open atm:

  • the helm chart updates around memory bumps; do we want these extracted in the separated values file?
  • the XDG_CONFIG_HOME and HOME env vars that get hard set to /tmp. Do we have any ideas here on what might be better values to utilize?
  • do we need to take action on the way the RunAsUser is set for the pod and container at the moment or is that something we can improve in the future?
  • other outstanding questions or items you may be aware of preventing us from moving forward with this PR?

For completeness I would also note that this PR now addresses #1341

@jhrozek
Copy link
Contributor

jhrozek commented Aug 12, 2025

@RoddieKieley I'll review the PR during today (thanks a lot for the updates!)

Answering specifically this for now:

the XDG_CONFIG_HOME and HOME env vars that get hard set to /tmp. Do we have any ideas here on what might be better values to utilize?

Do you remember in what cases are those needed? e.g. what MCP server with what settings? I was hoping we fixed the toolhive proxy runner to not write any configuration anymore. I ran into weirdly similar issues some time ago with the default chainguard images being readonly and I set up the env variables to /run tbh so /tmp might be as good as any. What I wonder is if we need to set up those at all.

If we do, that fixing the issue (i.e. not writing anything by the proxy runner) should IMO be a separate follow-up task.

@RoddieKieley
Copy link
Contributor Author

Do you remember in what cases are those needed? e.g. what MCP server with what settings?

I think originally it was the operator itself, then the fetch proxy and mcp server proper in turn as it was the first issue I ran into each step of the way. That being said those values have been in there since the beginning so I'll test taking them out now with the latest main and see what happens.

@ChrisJBurns
Copy link
Collaborator

ChrisJBurns commented Aug 12, 2025

@RoddieKieley Thanks for all the effort on this!

the helm chart updates around memory bumps; do we want these extracted in the separated values file?

I think in general we should be able to just override them as normal, this would allow us to keep the defaults

do we need to take action on the way the RunAsUser is set for the pod and container at the moment or is that something we can improve in the future?

The main thing from myself is being able to leave it in there as a default, if in the case of OpenShift, then we can have code that removes the necessary values if it needs to.

@jhrozek
Copy link
Contributor

jhrozek commented Aug 12, 2025

@jhrozek @ChrisJBurns I've squashed and force pushed my latest changes to my issue1063 branch which is the basis for this PR. The latest changes address a number of outstanding issues, however I do see some questions still open atm:

  • the helm chart updates around memory bumps; do we want these extracted in the separated values file?

I'm not against the memory bumps, although if @ChrisJBurns knows a way to set them on deployment or with maybe a values override that might be good too.

I wonder if the memory bump is due to openshift using its own Go version (although that used to only differ in crypto libraries and FIPS support IIRC?).

  • the XDG_CONFIG_HOME and HOME env vars that get hard set to /tmp. Do we have any ideas here on what might be better values to utilize?

replied elsewhere, I'm fine merging the code as-is and dropping the env variables setting when we fix the root cause

  • do we need to take action on the way the RunAsUser is set for the pod and container at the moment or is that something we can improve in the future?

I think it's fine for now. Just to be clear, are you referring to the operator being consistent with how openshift manages UIDs and SCCs which might impede the user from running privileged MCPs?

  • other outstanding questions or items you may be aware of preventing us from moving forward with this PR?

none from me, this is great 🚀

@jhrozek
Copy link
Contributor

jhrozek commented Aug 12, 2025

oh actually we need to make CI happy, depending on what do we end up doing with the values..CI is currently complaining about the chart version needing a bump (in deploy/charts/operator/Chart.yaml and deploy/charts/operator/README.md)

@jhrozek
Copy link
Contributor

jhrozek commented Aug 12, 2025

@ChrisJBurns can we add @RoddieKieley to whatever whitelist we have so that his PRs run CI without a post-fact approval?

Copy link
Contributor

@jhrozek jhrozek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! I don't really have comments except the default values where @ChrisJBurns probably has more of an opinion. Ready to ack :-)

@ChrisJBurns
Copy link
Collaborator

I think when its merged, there shouldn't be a need for approvals to run CI anymore. I think its only on first time contributors - at least thats what I've experienced on other repos. The Helm Chart bump should be simple enough, it just requires the Operator chart to be bumped up with a minor version (as well as the version in the Operator Chart README.md badge).

@jhrozek
Copy link
Contributor

jhrozek commented Aug 12, 2025

I think when its merged, there shouldn't be a need for approvals to run CI anymore. I think its only on first time contributors - at least thats what I've experienced on other repos. The Helm Chart bump should be simple enough, it just requires the Operator chart to be bumped up with a minor version (as well as the version in the Operator Chart README.md badge).

And here I was, running pre-commit run helm-docs --all-files manually like some kind of uncivilised cave man :-)

@ChrisJBurns
Copy link
Collaborator

Apologies, I realise my comment didn't separate the quote and my reply, have fixed it: #1253 (comment)

@jhrozek
Copy link
Contributor

jhrozek commented Aug 13, 2025

unfortunately there seems to be a conflict now, too

@RoddieKieley
Copy link
Contributor Author

I think it's fine for now. Just to be clear, are you referring to the operator being consistent with how openshift manages UIDs and SCCs which might impede the user from running privileged MCPs?

Yes exactly. As per discussion helm chart values will have migrated to a new values-openshift.yaml file once the latest updates are pushed.

Signed-off-by: Roddie Kieley <[email protected]>
…stacklok#1063)

    * Removed extraneous Transport set.
    * Bumped configureContainer debug logging to actual Debugf logging.
    * Reverted helm chart values and added separate adjusted values-openshift.yaml.

Signed-off-by: Roddie Kieley <[email protected]>
@RoddieKieley
Copy link
Contributor Author

unfortunately there seems to be a conflict now, too

I merged main as per @ChrisJBurns did recently to resolve the issue for the moment. Also did the Infof -> Debugf logging change, removal of the extraneous Transport set for the tests and split the OpenShift helm chart values into values-openshift.yaml while leaving the XDG_CONFIG_HOME and HOME env var settings alone as I was still getting a WRN when launching the fetch MCP Server with those gone:

Screenshot From 2025-08-13 12-32-12

Take a look and see if there's anything outstanding or anything that fails in CI or somewhere else with the latest changes that shouldn't.

@jhrozek
Copy link
Contributor

jhrozek commented Aug 14, 2025

I'm quite happy about the patch, thank you for the patience during the review. To make CI happy, we need to bump the chart version it seems:

diff --git i/deploy/charts/operator/Chart.yaml w/deploy/charts/operator/Chart.yaml
index c22271f..40ed608 100644
--- i/deploy/charts/operator/Chart.yaml
+++ w/deploy/charts/operator/Chart.yaml
@@ -2,5 +2,5 @@ apiVersion: v2
 name: toolhive-operator
 description: A Helm chart for deploying the ToolHive Operator into Kubernetes.
 type: application
-version: 0.2.1
-appVersion: "0.2.1"
+version: 0.2.2
+appVersion: "0.2.2"
diff --git i/deploy/charts/operator/README.md w/deploy/charts/operator/README.md
index 9ba350d..c11b85c 100644
--- i/deploy/charts/operator/README.md
+++ w/deploy/charts/operator/README.md
@@ -1,7 +1,7 @@

 # ToolHive Operator Helm Chart

-![Version: 0.2.1](https://img.shields.io/badge/Version-0.2.1-informational?style=flat-square)
+![Version: 0.2.2](https://img.shields.io/badge/Version-0.2.2-informational?style=flat-square)
 ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square)

 A Helm chart for deploying the ToolHive Operator into Kubernetes.

…t values. (stacklok#1063)

    * Fix TestDefaultPlatformDetector_DetectPlatform test for case when OPERATOR_OPENSHIFT env var is true.
    * Add empty runAsUser to values-openshift.yaml to allow OpenShift to set it.
    * Bump the helm chart and app versions from 0.2.1 to 0.2.2.

Signed-off-by: Roddie Kieley <[email protected]>
Co-authored-by: Cursor claude-4 <[email protected]>
@RoddieKieley
Copy link
Contributor Author

@jhrozek I bumped the helm chart values and in testing found a test failure when an added env var was set, and as well noticed that I missed setting the runAsUser to be empty in the custom values-openshift.yaml helm chart values file.

When you get a chance, take a look and see if there are any other outstanding items that need to be addressed prior to merge.

@jhrozek jhrozek merged commit 0a52a43 into stacklok:main Aug 15, 2025
18 checks passed
@RoddieKieley RoddieKieley deleted the issue1063 branch August 15, 2025 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants