-
Notifications
You must be signed in to change notification settings - Fork 522
MCO-1928: Enhacement of the MCO OS streams feature #1874
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Skipping CI for Draft Pull Request. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@pablintino: This pull request references MCO-1928 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
This commit adds the formal enhancement proposal for MCO-1914. This feature adds the ability to run multiple OS base images simultaneously in a cluster by allowing users to select an image 'stream' on a per-MCP basis.
|
|
||
| Not applicable. | ||
|
|
||
| ## Alternatives (Not Implemented) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To answer a question from the API review office hours, the alternatives we considered are:
- expanding on the existing configmap to refer to more images
- having this as a status field on an existing object, i.e. the MachineConfiguration cluster object
1 was deemed less optimal since it's an arbitrary map without guardrails, and also with a new API we can use VAP to wire up cross-object validation
2 was deemed less optimal since it's managed by the CVO intead of the MCO, and thus the bootstrap-time MCO doesn't understand or process it (I suppose we could set that up, though). We would like to have a bootstrap (install) time workflow that matches the in-cluster workflow that we can process in the same way, possibly eventually as a install-config field, so the new API object that the MCO manages would be easier to work in both cases.
Also, the MachineConfiguration has introduced quite a lot of new fields in the past few versions, mostly for cluster-knobs (so configurables, instead of status-only discovery objects), so we thought a new CR would be clearer.
yuqi-zhang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few general thoughts (also inline):
- a bit too much detail on the implementation sections, becomes a bit hard to follow, if we can make some sections more concise, and outline a more general phase approach instead of per-section phase approach, I think that might be more clear
- let's be careful about specific version naming and commitments that's more long term
- let's be clear on the scope. On point 1, we can also scope this enhancement more on the immediate implementation (rhel9->10) and leave details for future phases when we get to them
|
|
||
| ### User Stories | ||
|
|
||
| - **Specify OS stream per pool**: Set `spec.osImageStream` on a MachineConfigPool to provision nodes with a different OS version than the rest of the cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a note, these sound more like workflow and constraints than user stories. The user stories should be presented more like the motivation section above (admin use cases). The motivation can also talk about some of the struggles we faced when going from rhel8->9 instead.
| - **API-driven stream management**: Declarative, Kubernetes-native API for stream selection | ||
| - **Automatic stream discovery**: Populate available OS streams from release payload ImageStream metadata | ||
| - **Backward compatibility**: Existing clusters continue working; streams are opt-in via feature gate | ||
| - **Multi-source stream configuration**: Support CLI arguments, release ImageStream, and ConfigMap sources with defined precedence |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand what this point is talking about, could you clarify?
|
|
||
| ### Non-Goals | ||
|
|
||
| - **Supporting unlimited concurrent OS streams**: While the architecture supports |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lean towards not considering this a non-goal for now, since we're not sure about the exact future plans.
| automated migration logic or upgrade orchestration. Administrators must manually | ||
| select streams for their pools. | ||
|
|
||
| - **Bidirectional stream switching**: Only RHEL 9 → RHEL 10 migration is supported |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As somewhat of a general comment, I feel like part of the enhancement describes the overall goal of multi-streams, but some of the enhancement seems to want to frame this as a RHEL9->RHEL10 transition. I think we should be a bit more clear around the framing. So we would either:
- scope the enhancement to RHEL9->RHEL10 transition as the plan for now, and explicitly call out not including other streams in other sections
- take a phased approach and clearly call out the boundaries of each phase (starting with rhel9->10, adding more in the future)
Right now reading some of the background it sounds like we're introducing a whole variety of streams which is not the immediate plan, so I think it would be good to have some clarity for the readers.
| Standard MachineConfigPool rollback mechanisms apply, but stream-specific rollback | ||
| is out of scope for the initial release. | ||
|
|
||
| - **Version skew enforcement**: Enforcing compatibility rules between different |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and the following three points are out of scope enough that I don't think we should mention them here.
Also we should be careful around target versions for features not in this enhancement, so I'd lean towards removing these 4 points.
| - New clusters install with RHCOS 9 | ||
| - Existing clusters remain on their current stream | ||
|
|
||
| **OpenShift 5.0+ Releases:** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, let's not talk about specific openshift versions, since that has not been decided anywhere
|
|
||
| 3. **New MachineConfigPools created in OpenShift 5.0**: Use the new default `"rhel10-coreos"` | ||
|
|
||
| **Migration Process:** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some considerations around having the update be in one reboot (4.y + rhel9->4.(y+1) + rhel10, as opposed to 4.y + rhel9 -> 4.y + rhel10 -> 4.(y+1) + rhel10.
Regardless, this is for a future phase which I think we should clarify about
| automatic side effects of platform upgrades. This aligns with the core goal of | ||
| separating platform upgrades from OS version transitions. | ||
|
|
||
| ##### Backward Compatibility with Pre-Streams Clusters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These following sections are quite verbose and have too many details. Would you be able to restructure and simplify?
| Starting in OpenShift 4.20/4.21, separate RHEL 10 release streams will be produced to | ||
| enable early testing: | ||
| - Custom RHEL 10-based payloads from nightly builds (similar to F-COS/S-COS for OKD) | ||
| - Enables testing RHEL 10 before it becomes the default in OpenShift 5.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, let's remove references to openshift 5
| - Note: RHEL 8 → 9 transition revealed distinct bugs with FIPS and real-time kernel | ||
| variants not seen in standard testing | ||
|
|
||
| **Image Mode OpenShift:** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We haven't talked about image mode at all in the previous sections. Should we have a snippet above if we're going to talk about testing?
This commit adds the formal enhancement proposal for MCO-1914.
This feature adds the ability to run multiple OS base images simultaneously in a cluster by allowing users to select an image 'stream' on a per-MCP basis.