-
Notifications
You must be signed in to change notification settings - Fork 522
NE-2183: Openshift conditions on Gateway API status #1871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
@rikatz: This pull request references NE-2183 which is a valid jira issue. In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@rikatz: This pull request references NE-2183 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/assign |
|
/assign |
|
/retest |
|
@rikatz: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
the error is valid, it is due to the metadata not containing real approvers and reviewers for now. Once I get some review and approval I can fix it |
| * Set to `False` with reason `UnsupportedEndpointPublishingStrategy` when the | ||
| publishing strategy is not LoadBalancerService |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this reason make sense for the current implementation? We don't have a knob for changing the publishing strategy for Gateway controller. Also, does Istio have any other mode of publishing gateway except for a service of type LoadBalancer? I only know of the manual deployment which we disabled.
| *LoadBalancerManaged Condition:* | ||
| * Set to `False` with reason `UnsupportedEndpointPublishingStrategy` when the | ||
| publishing strategy doesn't require a managed LoadBalancer | ||
| * Set to `True` with reason `Normal` when a LoadBalancer service should be managed by OpenShift | ||
|
|
||
| *LoadBalancerReady Condition:* | ||
| * Set to `False` with reason `ServiceNotFound` when the associated Service | ||
| resource cannot be found | ||
| * Set to `False` with reason `LoadBalancerPending` when the service exists but | ||
| has no LoadBalancer ingress (external IP/hostname) | ||
| * Set to `False` with reason `SyncLoadBalancerFailed` when cloud provider events | ||
| indicate provisioning failures | ||
| * Set to `True` with reason `LoadBalancerProvisioned` when the service has an | ||
| external IP or hostname assigned | ||
| * Error messages include cloud provider error details extracted from service events |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The conditions for the load balancer service seem kinda questionable to me for the following reasons:
- Publishing service is managed by Istiod, not by CIO (unlike DNS). So the word "LoadBalancerManaged" doesn't precise who's managing the service.
- The supported
Programmedstatus should already cover most important cases listed here (maybe except for the failure event). IfProgrammedisTruewe should have the ingress address for the service, to be able to add it to Gateway's address status. - Even though the
Readystatus is not "standard" and doesn't seem to be supported by Istio for the moment. It seems to be the one which may have an "overlap" meaning withLoadBalancerReadyin the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, so consider that all of these conditions are a mirror of what we have on CIO. In fact, the same logic from CIO is used to generate Gateway API conditions.
I think that the LoadBalancerManaged is not just about Istio adding it, but being able to make explicit why the LoadBalancer (and the Gateway) are not ready if tere is a problem with the Load balancer controller.
That said, if we agree that the whole "lifecycle" for LoadBalancer and its conditions should be accounted directly by Istio, I think that NE-2183 makes partial sense for DNS management, but the whole Load balancer story can be dropped, wdyt?
| * This logic is used by both the existing IngressController status controller | ||
| and the new Gateway status controller |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to make sure to not impact the existing ingresscontroller status logic. It's quite complex and quite touchy for our customers. We have the test coverage to catch the regression, of course. But I would prefer to limit the changes into the existing logic to very save ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's what was made. The test packages weren't covered (eg.: they may have been moved, but the ingress controller tests that verify conditions being properly set weren't changed). All of the logics are still there. All of the e2e.
as this EP was written after the code, I would say that "take a look into the code" just to be sure that i am not saying anything wrong, but I am confident that we should be reusing the code, and that our tests are already covering that no change was made on CIO logics (aka no test from CIO was touched)
|
|
||
| **Resource Discovery:** | ||
| * The controller watches the following resources to trigger reconciliation: | ||
| - Gateway resources in `openshift-ingress` namespace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With openshift.io/gateway-controller/v1 controller, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, let me update it here
| **Platform Detection:** | ||
| * Platform type is determined from the cluster Infrastructure resource | ||
| * DNS zone configuration comes from the cluster DNS resource | ||
| * The DNSManagementPolicy field on IngressController-like configuration determines whether DNS is managed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have this field in IngressController resource indeed. Do you mean to use it somehow for Gateway status? DNSRecord does have DNSManagementPolicy field too. I thought it will be used by the new controller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeha, this is halucination, let me fix it here
|
|
||
| **Permissions:** | ||
| * The cluster-ingress-operator service account is granted additional RBAC permissions to: | ||
| - Get, List, Watch Gateway resources |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CIO already has all the verbs access for Gateway resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know, but as part of an enhancement proposal I think it is good to make it clear what permissions are required for this controller. I can drop it, but I don't see a harm on making it explicit
| * The cluster-ingress-operator service account is granted additional RBAC permissions to: | ||
| - Get, List, Watch Gateway resources | ||
| - Patch Gateway status subresource | ||
| - Get, List, Watch DNSRecord and Service resources by label |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CIO already has all the verbs access for DNSRecord resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wont be able to get into all the comments this week :/
But answering: while I do agree with you that CIO already has it, it is important IMO as part of the architecture to make it clear "what permissions ARE required for this to work" and I think this is the case here.
Same applies for #1871 (comment)
| * Rejected because: Events are transient and harder to query programmatically; | ||
| conditions provide persistent, queryable status | ||
|
|
||
| ## Open Questions [optional] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about conflicting hostnames? With the current DNS implementation from CIO, DNS resource record will have multiple entries for the same hostname round robin between gateway controllers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is part of this enhancement specifically as we are just cloning Openshift conditions to Gateway API.
But, maybe once openshift/cluster-ingress-operator#1229 is merged it will enrich the status conditions here.
Anyway, I am happy to add the conflicted DNS here once we have this conditions
| * The Gateway API specification defines standard conditions like `Accepted` and `Programmed` | ||
| * We could map DNS/LB status into these existing conditions rather than adding new ones |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree about the DNS part but not for the LB one which is managed by Gateway implementation (==Istio).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alternatives are about solutions we do not implement, but were somehow considered.
|
Forgot to add a description to the review. I was about to look at openshift/cluster-ingress-operator#1294 but then recalled that there is an EP. So I started from the EP. |
This enhancement proposal adds the Ingress Controller Conditions (LoadBalancerManaged, LoadBalancerReady, DNSManaged and DNSReady) to Gateway API resources that created with Openshift Gateway Class and on openshift-ingress namespace.
This proposal was partially generated with the help of Claude/AI