Skip to content

Conversation

@pablintino
Copy link
Contributor

@pablintino pablintino commented Oct 24, 2025

This commit adds the formal enhancement proposal for MCO-1914.

This feature adds the ability to run multiple OS base images simultaneously in a cluster by allowing users to select an image 'stream' on a per-MCP basis.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 24, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 24, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 24, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign jupierce for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@pablintino pablintino changed the title [MCO-1914] Enhacement of the MCO OS streams feature [MCO-1928] Enhacement of the MCO OS streams feature Oct 24, 2025
@pablintino pablintino changed the title [MCO-1928] Enhacement of the MCO OS streams feature MCO-1928: Enhacement of the MCO OS streams feature Oct 24, 2025
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Oct 24, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Oct 24, 2025

@pablintino: This pull request references MCO-1928 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

This commit adds the formal enhancement proposal for MCO-1914.

This feature adds the ability to run multiple OS base images simultaneously in a cluster by allowing users to select an image 'stream' on a per-MCP basis.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

This commit adds the formal enhancement proposal for MCO-1914.

This feature adds the ability to run multiple OS base images
simultaneously in a cluster by allowing users to select an image
'stream' on a per-MCP basis.

Not applicable.

## Alternatives (Not Implemented)
Copy link
Contributor

@yuqi-zhang yuqi-zhang Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To answer a question from the API review office hours, the alternatives we considered are:

  1. expanding on the existing configmap to refer to more images
  2. having this as a status field on an existing object, i.e. the MachineConfiguration cluster object

1 was deemed less optimal since it's an arbitrary map without guardrails, and also with a new API we can use VAP to wire up cross-object validation
2 was deemed less optimal since it's managed by the CVO intead of the MCO, and thus the bootstrap-time MCO doesn't understand or process it (I suppose we could set that up, though). We would like to have a bootstrap (install) time workflow that matches the in-cluster workflow that we can process in the same way, possibly eventually as a install-config field, so the new API object that the MCO manages would be easier to work in both cases.

Also, the MachineConfiguration has introduced quite a lot of new fields in the past few versions, mostly for cluster-knobs (so configurables, instead of status-only discovery objects), so we thought a new CR would be clearer.

Copy link
Contributor

@yuqi-zhang yuqi-zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few general thoughts (also inline):

  1. a bit too much detail on the implementation sections, becomes a bit hard to follow, if we can make some sections more concise, and outline a more general phase approach instead of per-section phase approach, I think that might be more clear
  2. let's be careful about specific version naming and commitments that's more long term
  3. let's be clear on the scope. On point 1, we can also scope this enhancement more on the immediate implementation (rhel9->10) and leave details for future phases when we get to them


### User Stories

- **Specify OS stream per pool**: Set `spec.osImageStream` on a MachineConfigPool to provision nodes with a different OS version than the rest of the cluster
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a note, these sound more like workflow and constraints than user stories. The user stories should be presented more like the motivation section above (admin use cases). The motivation can also talk about some of the struggles we faced when going from rhel8->9 instead.

- **API-driven stream management**: Declarative, Kubernetes-native API for stream selection
- **Automatic stream discovery**: Populate available OS streams from release payload ImageStream metadata
- **Backward compatibility**: Existing clusters continue working; streams are opt-in via feature gate
- **Multi-source stream configuration**: Support CLI arguments, release ImageStream, and ConfigMap sources with defined precedence
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand what this point is talking about, could you clarify?


### Non-Goals

- **Supporting unlimited concurrent OS streams**: While the architecture supports
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lean towards not considering this a non-goal for now, since we're not sure about the exact future plans.

automated migration logic or upgrade orchestration. Administrators must manually
select streams for their pools.

- **Bidirectional stream switching**: Only RHEL 9 → RHEL 10 migration is supported
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As somewhat of a general comment, I feel like part of the enhancement describes the overall goal of multi-streams, but some of the enhancement seems to want to frame this as a RHEL9->RHEL10 transition. I think we should be a bit more clear around the framing. So we would either:

  1. scope the enhancement to RHEL9->RHEL10 transition as the plan for now, and explicitly call out not including other streams in other sections
  2. take a phased approach and clearly call out the boundaries of each phase (starting with rhel9->10, adding more in the future)

Right now reading some of the background it sounds like we're introducing a whole variety of streams which is not the immediate plan, so I think it would be good to have some clarity for the readers.

Standard MachineConfigPool rollback mechanisms apply, but stream-specific rollback
is out of scope for the initial release.

- **Version skew enforcement**: Enforcing compatibility rules between different
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the following three points are out of scope enough that I don't think we should mention them here.

Also we should be careful around target versions for features not in this enhancement, so I'd lean towards removing these 4 points.

- New clusters install with RHCOS 9
- Existing clusters remain on their current stream

**OpenShift 5.0+ Releases:**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, let's not talk about specific openshift versions, since that has not been decided anywhere


3. **New MachineConfigPools created in OpenShift 5.0**: Use the new default `"rhel10-coreos"`

**Migration Process:**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some considerations around having the update be in one reboot (4.y + rhel9->4.(y+1) + rhel10, as opposed to 4.y + rhel9 -> 4.y + rhel10 -> 4.(y+1) + rhel10.

Regardless, this is for a future phase which I think we should clarify about

automatic side effects of platform upgrades. This aligns with the core goal of
separating platform upgrades from OS version transitions.

##### Backward Compatibility with Pre-Streams Clusters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These following sections are quite verbose and have too many details. Would you be able to restructure and simplify?

Starting in OpenShift 4.20/4.21, separate RHEL 10 release streams will be produced to
enable early testing:
- Custom RHEL 10-based payloads from nightly builds (similar to F-COS/S-COS for OKD)
- Enables testing RHEL 10 before it becomes the default in OpenShift 5.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, let's remove references to openshift 5

- Note: RHEL 8 → 9 transition revealed distinct bugs with FIPS and real-time kernel
variants not seen in standard testing

**Image Mode OpenShift:**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We haven't talked about image mode at all in the previous sections. Should we have a snippet above if we're going to talk about testing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants