-
Notifications
You must be signed in to change notification settings - Fork 92
Deploy ToolHive Operator into OpenShift (#1063) #1253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
RoddieKieley
commented
Aug 1, 2025
When running go-task operator-test I observed the following problem first:
As well as a number of test failures that didn't seem all associated with the changes in this PR. I passed the terminal output through cursor -> o3 with a request to write the content out to markdown which I'm attaching: I utilized the 'Option B' Fix for the "shell" issue above and that worked locally for me on Fedora 42. Not sure how other environment friendly that is however so not including it in the PR itself. Either way feedback welcome and would be curious if you've had any luck with OKD @jhrozek ? |
For awareness, I will be deploying OpenShift into our environment next week for @jhrozek to play around with ToolHive installations |
Hey @ChrisJBurns could you also suggest a default strategy for the XDG_CONFIG_HOME env vars? ATM we're hardcoding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the patch @RoddieKieley ! We'll work with @ChrisJBurns on setting up the OKD cluster so we can test properly.
@RoddieKieley good news! Thanks to @ChrisJBurns we have an OKD cluster now 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I few more comments from me. I'm wondering if we instead do the OpenShift checks in the Operator and then pass in the desired securityContext
's to the ProxyRunner? This avoids the need for the isOpenShift
flags at the runtime layer when it comes to the podTemplateSpec
and also allows us to remove some of the securityContext code from the runtime layer.
256c9d5
to
3aac671
Compare
* Update helm chart resources and seccompProfile type values for OKD environment. * Update MCPServer Deployment with check for XDG_CONFIG_HOME and HOME env vars being set. * Add PlatformDetector interface with detection implementation for Kubernetes and OpenShift. * Update behaviour of tests now requiring in cluster to skip locally. Signed-off-by: Roddie Kieley <[email protected]> Co-authored-by: Cursor o3 <[email protected]> Co-authored-by: Cursor claude-4 <[email protected]>
3aac671
to
95da812
Compare
@jhrozek @ChrisJBurns I've squashed and force pushed my latest changes to my issue1063 branch which is the basis for this PR. The latest changes address a number of outstanding issues, however I do see some questions still open atm:
For completeness I would also note that this PR now addresses #1341 |
@RoddieKieley I'll review the PR during today (thanks a lot for the updates!) Answering specifically this for now:
Do you remember in what cases are those needed? e.g. what MCP server with what settings? I was hoping we fixed the toolhive proxy runner to not write any configuration anymore. I ran into weirdly similar issues some time ago with the default chainguard images being readonly and I set up the env variables to If we do, that fixing the issue (i.e. not writing anything by the proxy runner) should IMO be a separate follow-up task. |
I think originally it was the operator itself, then the fetch proxy and mcp server proper in turn as it was the first issue I ran into each step of the way. That being said those values have been in there since the beginning so I'll test taking them out now with the latest main and see what happens. |
@RoddieKieley Thanks for all the effort on this!
I think in general we should be able to just override them as normal, this would allow us to keep the defaults
The main thing from myself is being able to leave it in there as a default, if in the case of OpenShift, then we can have code that removes the necessary values if it needs to. |
I'm not against the memory bumps, although if @ChrisJBurns knows a way to set them on deployment or with maybe a values override that might be good too. I wonder if the memory bump is due to openshift using its own Go version (although that used to only differ in crypto libraries and FIPS support IIRC?).
replied elsewhere, I'm fine merging the code as-is and dropping the env variables setting when we fix the root cause
I think it's fine for now. Just to be clear, are you referring to the operator being consistent with how openshift manages UIDs and SCCs which might impede the user from running privileged MCPs?
none from me, this is great 🚀 |
oh actually we need to make CI happy, depending on what do we end up doing with the values..CI is currently complaining about the chart version needing a bump (in |
@ChrisJBurns can we add @RoddieKieley to whatever whitelist we have so that his PRs run CI without a post-fact approval? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! I don't really have comments except the default values where @ChrisJBurns probably has more of an opinion. Ready to ack :-)
cmd/thv-operator/controllers/mcpserver_resource_overrides_test.go
Outdated
Show resolved
Hide resolved
I think when its merged, there shouldn't be a need for approvals to run CI anymore. I think its only on first time contributors - at least thats what I've experienced on other repos. The Helm Chart bump should be simple enough, it just requires the Operator chart to be bumped up with a minor version (as well as the version in the Operator Chart README.md badge). |
And here I was, running |
Apologies, I realise my comment didn't separate the quote and my reply, have fixed it: #1253 (comment) |
unfortunately there seems to be a conflict now, too |
Yes exactly. As per discussion helm chart values will have migrated to a new values-openshift.yaml file once the latest updates are pushed. |
Signed-off-by: Roddie Kieley <[email protected]>
…stacklok#1063) * Removed extraneous Transport set. * Bumped configureContainer debug logging to actual Debugf logging. * Reverted helm chart values and added separate adjusted values-openshift.yaml. Signed-off-by: Roddie Kieley <[email protected]>
31cab0d
to
7177140
Compare
I merged main as per @ChrisJBurns did recently to resolve the issue for the moment. Also did the Infof -> Debugf logging change, removal of the extraneous Transport set for the tests and split the OpenShift helm chart values into values-openshift.yaml while leaving the XDG_CONFIG_HOME and HOME env var settings alone as I was still getting a WRN when launching the fetch MCP Server with those gone: ![]() Take a look and see if there's anything outstanding or anything that fails in CI or somewhere else with the latest changes that shouldn't. |
I'm quite happy about the patch, thank you for the patience during the review. To make CI happy, we need to bump the chart version it seems:
|
2e2c295
to
b894265
Compare
…t values. (stacklok#1063) * Fix TestDefaultPlatformDetector_DetectPlatform test for case when OPERATOR_OPENSHIFT env var is true. * Add empty runAsUser to values-openshift.yaml to allow OpenShift to set it. * Bump the helm chart and app versions from 0.2.1 to 0.2.2. Signed-off-by: Roddie Kieley <[email protected]> Co-authored-by: Cursor claude-4 <[email protected]>
b894265
to
2f79a3a
Compare
@jhrozek I bumped the helm chart values and in testing found a test failure when an added env var was set, and as well noticed that I missed setting the runAsUser to be empty in the custom values-openshift.yaml helm chart values file. When you get a chance, take a look and see if there are any other outstanding items that need to be addressed prior to merge. |