First pass at a schema for `Backend` plus musings on scope #20

keithmattix · 2025-11-30T22:36:58Z

Sketch out some Go types and add a section on whether Backend is namespace or cluster-scoped. I usually design CRDs in protobuf since Istio does, so please chime in with missing Kubebuilder validations or otherwise incorrect conventions. I'm not married to much that's here, but wanted to get something out to get discussion started.

Sketch out some Go types and add a section on whether Backend is namespace or cluster-scoped. Signed-off-by: Keith Mattix II <[email protected]>

keithmattix · 2025-11-30T22:38:15Z

proposals/10-egress-gateways.md


 Examples in this document are illustrative only.

+#### Scope and Persona Ownership


@howardjohn @mikemorris Feel free to chime in with any thoughts on the scoping story

Hmm maybe this isn't an issue since, at this point, Backend is only referenced via an xRoute? But I still wonder how the admin sets policy for a particular FQDN if any app owner can create a Backend

Having both a global and local Backend makes sense to me. If we go this route, the main question would be around resolving conflicts.

What seems correct, is for a globally scoped Backend to take precedence. This avoids the problem of needing to ensuring that a global policy -- which may be required for compliance -- isn't silently overridden.

If we go this route we'd need to set a status condition on the namespaced backend to indicate that it's being overridden.

+1 - Global backends should take precedence over local ones and we should report in status. This is slowly becoming my preferred option the more that I think about it

I think my preferred approach for this would be having both namespace-scoped and cluster-scoped options for a frontend, but keeping Backend as a single namespaced resource, and treating cluster/global scoped definitions as a "last hop" rather than "override". I'll try to illustrate how I'm defining that difference below:

Namespace app { Pod -> [HTTP request] -> namespaced FrontendThing (ServiceEntry, ExternalName, etc) -[backendRef] -> Backend{FQDN, Pods, IPAdddress, etc} } ClusterServiceEntry -[backendRef] -> Backend

Override

In the override model, a ClusterServiceEntry for example.com -> Backend{FQDNfoo.com} ensures that any traffic leaving the pod is forcibly redirected to foo.com, regardless of whether a namespaced ServiceEntry is trying to redirect it locally

"Last hop"

In the last hop model, a ClusterServiceEntry for example.com -> Backend{FQDNfoo.com} is only applied to traffic leaving the pod if example.com is still the destination after any local namespaced ServiceEntry has been applied. So if ServiceEntry{example.com} -> Backend{FQDNbar.com} exists, then the ClusterServiceEntry for example.com has no effect and the traffic egresses to bar.com

Thinking this through further, there might be use cases for each model, similar to the overrides vs defaults behavior described in https://gateway-api.sigs.k8s.io/geps/gep-2649/?h=override#hierarchy and maybe this behavior should be configurable?

Ref #20 (comment) for further exploration on explicitly routing through an egress Gateway.

Thanks for writing all of this down! I do agree that both default and override behaviors have valid use-cases...but to do this, we almost certainly need a Frontend or ServiceEntry resource, and I'm not sure that we have the bandwidth to bite off that piece specifically at this point...that's why I'm kind of wanting the gateway to serve as the frontend aspect and leave it up to implementations to figure out how to get the traffic to the gateway for right now.

That being said, I think you're right that we don't need cluster-wide backend AND cluster-wide frontend. Of the two, cluster-wide frontend probably makes the most sense when I think of the use-cases I see in Istio. ~~I'm just trying to think about the implications of having the gateway routing configuration be different depending on the source of the traffic...~~ Wait that's not true; it's not based on the source namespace; it's based on the route referencing the backend. And that's not too terrible I think

howardjohn · 2025-12-01T15:25:05Z

proposals/10-egress-gateways.md

+  Protocol BackendProtocol `json:"protocol"`
+  // TLS defines the TLS configuration for the backend.
+  // +optional
+  TLS *BackendTLS `json:"tls,omitempty"`


Whats the semantics of policy-attached BackendTLSPolicy + inline co-existing?

For now at least (pre-GEP), I'd say BackendTLSPolicy is not allowed to have Backend as a targetRef, so we can defer the decision after we get a better sense of Backend semantics (e.g scoping). I have a bias towards inlining, so my ideal would probably be to have the inline policy take precedence if defined

I think a decision to choose anything other than BackendTLSPolicy here requires significantly more discussion + detail in this proposal. If the goal is just to copy + inline BackendTLSPolicy types, that might make sense, but there are other benefits of a policy here, such as the ability to reuse config across different backends.

@robscott we should definitely discuss in more detail, and I'll try and expand upon the reasoning in the doc. I think the ideal situation is that this field is just an exact pointer to BackendTLSPolicy. However, I see some difficulties with that. For one, I think the default behavior when talking to an external fqdn should be to require TLS (I discuss this elsewhere in the proposal); BackendTLSPolicy will never really allow us to do that because the API was designed for Kubernetes Services that have zero expectation of TLS. On top of that, relying on BTLSPolicy means that we can't have per-backend mTLS; IMO, that's a pretty rough limitation (especially in multitenant environments). So I would like to try and pursue inlining for now.

That being said, I don't like the idea of having TLS for Backend type KubernetesService differ from BackendTLSPolicy. Maybe what that means is that Backend starts off (or stays) as ExternalService only, but I think that would be quite a shame.

proposals/10-egress-gateways.md

howardjohn · 2025-12-01T15:25:51Z

proposals/10-egress-gateways.md

+  // Use implementation's built-in TLS (e.g. service mesh powered mTLS).
+  BackendTLSModePLATFORM_PROVIDED BackendTLSMode = "PLATFORM_PROVIDED"
+  // Disable TLS.
+  BackendTLSModeINSECURE_DISABLE BackendTLSMode = "INSECURE_DISABLE"


Why do we want a TLS policy to not do TLS?

My thought was to try and make it explicit: this is not an optional field so you must set something and if you want to disable TLS, you must do so explicitly, acknowledging that it's insecure

isn't non-TLS the default though? Or we are saying, TLS is always the default and you need to opt out? That would be pretty confusing behavior

For external FQDN, I would think some level of TLS should be the default for security reasons. But I'm willing to be convinced the other way

proposals/10-egress-gateways.md

Signed-off-by: Keith Mattix II <[email protected]>

usize · 2025-12-02T01:09:02Z

proposals/10-egress-gateways.md


 Examples in this document are illustrative only.

+#### Scope and Persona Ownership


Having both a global and local Backend makes sense to me. If we go this route, the main question would be around resolving conflicts.

What seems correct, is for a globally scoped Backend to take precedence. This avoids the problem of needing to ensuring that a global policy -- which may be required for compliance -- isn't silently overridden.

If we go this route we'd need to set a status condition on the namespaced backend to indicate that it's being overridden.

usize · 2025-12-02T01:28:06Z

proposals/10-egress-gateways.md

+  Name string `json:"name"`
+  // +required
+  Type string `json:"type"`
+  // TODO: How does this work practically? Can we leverage Kubernetes unstructured types here?


An unstructured type makes sense, but my assumption is that we should require a schema for every extension type even when there's no CRD.

The schemas can be stored in, or linked from, a config map, then the configs can be verified by a webhook or the controller.

It lets us have our cake and eat it too in terms of having config validation without requiring a CRD for each extension.

It also has the knock-on advantage of advertising all of the available extension types.

Yeah I definitely think there must be some sort of schema, but I'm wondering if, from an implementation perspective, it makes sense to force folks to use Kubernetes schemes specifically (which I think is required if we rely on unstructured)

usize · 2025-12-02T01:39:49Z

proposals/10-egress-gateways.md

+  BackendProtocolHTTP  BackendProtocol = "HTTP"
+  BackendProtocolHTTP2 BackendProtocol = "HTTP2"
+  BackendProtocolTCP   BackendProtocol = "TCP"
+  BackendProtocolMCP   BackendProtocol = "MCP"


Will we be expanding the core enum with additional AI protocols e.g. A2A ?

I wonder if it makes sense to reserve BackendProtocol for common transport protocols HTTP/2, TCP, gRPC and layer in MCP (and other AI protocols later) via BackendProtocolOptions. To make it easier to extend as things evolve in the AI space?

My plan was to expand it with core protocols we'll have in the API (like A2A and possibly LLM). Prior art in gateway API is to allow implementations to have their own protocols with a vendor prefix (e.g. istio.io/a2a); maybe BackendProtocolOptions should have a slot for vendor specific protocols too? The big drawback with that option is that we make so much vendor specific that it's too hard to standardize down the line /cc @kflynn @robscott @shaneutt

k8s-ci-robot · 2025-12-02T01:40:07Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: keithmattix, usize

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [keithmattix]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

proposals/10-egress-gateways.md

keithmattix · 2025-12-02T20:54:59Z

proposals/10-egress-gateways.md


 This proposal uses the following resource relationship:
 ```
 Gateway <-[parentRef]- HTTPRoute -[backendRef]-> Backend


I'm thinking about how this would work for meshes (especially GAMMA-style implementations). In most implementations I'm aware of, services aren't aware of Gateways declaratively; the meshed client sees that the destination for the request should go through the gateway and it sends it there. This proposal defines the Backend role of the external service, but there's nothing for the frontend role. Should meshes just have this be implementation specific?

In Istio's case, I'd imagine a user creates an HTTPRoute with a ServiceEntry (acting as a frontend only) parentRef with a backendRef pointing to the egress gateway. Then, to send traffic from the gateway to the external service, the user creates another HTTPRoute with the hostname of the external service that has a backendRef of kind Backend.

Any thoughts on how to proceed here from mesh implementations? @howardjohn @kflynn @youngnick

Should meshes just have this be implementation specific?

Longer term I hope not, but maybe initially.

With decomposing the backend role of Service, we do still need some way to express a frontend. For in-cluster services, this probably stays as Service for now, but it would be nice to introduce a lower-level resource like a general FQDN or more specific ClusterDNSName that Service might eventually be refactored to use internally while making that functionality available independently for other use cases.

In Istio's case, I'd imagine a user creates an HTTPRoute with a ServiceEntry (acting as a frontend only) parentRef with a backendRef pointing to the egress gateway. Then, to send traffic from the gateway to the external service, the user creates another HTTPRoute with the hostname of the external service that has a backendRef of kind Backend.

ServiceEntry (or something simpler in Gateway API or k8s like an FQDN or ExternalName resource) as a frontend makes sense to me. My initial reaction to this was a preference to avoid making the backendRef to an egress Gateway resource explicit, but I think it might make sense if this can be configured with a cluster-scoped resource to be the "last hop" before egress after a namespaced resource has an opportunity to first redirect a request, rather than being an "override" - something like the diagram below:

Namespace app [ Pod -> [HTTP request] -> namespaced ServiceEntry | ExternalName ] -> ClusterServiceEntry | ClusterExternalName - [backendRef] -> Namesapce egress [ Gateway <-[parentRef]- HTTPRoute -[backendRef] -> Backend[FQDNs, Pods, IPAddresses] <-[BackendTrafficPolicy] ]

For cases where the full centralization and functionality of an egress gateway isn't needed, this might instead look like:

Namespace app [ Pod -> [HTTP request] -> namespaced ServiceEntry | ExternalName ] -> ClusterServiceEntry | ClusterExternalName - [backendRef] -> Namesapce egress [ Backend[FQDNs, Pods, IPAddresses] <-[BackendTrafficPolicy] ]

I think I addressed this elsewhere; let me know if there's nuance you want to bring here

david-martin · 2025-12-03T09:29:26Z

proposals/10-egress-gateways.md

+  // MCP protocol version. MUST be one of V2|V3.
+  // +optional
+  // +kubebuilder:validation:MaxLength=256
+  Version string `json:"version,omitempty"`


Right now, the format is YYYY-MM-DD.
Better to point to https://modelcontextprotocol.io/specification/versioning and say it must be a valid version as per the project strategy?

Sorry you caught some of the AI autocomplete I missed. Yeah we should definitely do that

proposals/10-egress-gateways.md

hzxuzhonghu · 2025-12-04T06:36:16Z

proposals/10-egress-gateways.md

-    # clientCertificateRef:  # if MUTUAL
-    #   name: egress-client-cert
-  # possible extension semantics, for illustration purposes only.
+    ports:


I am not sure how does the gateway select which port as the target if the backend is refered by a httpRoute

HTTPRoute backendRefs have ports; you'd pick a port that matches the backend

hzxuzhonghu · 2025-12-04T06:38:30Z

proposals/10-egress-gateways.md

+        mode: SIMPLE | MUTUAL | PASSTHROUGH | PLATFORM_PROVIDED | INSECURE_DISABLE
+        sni: api.openai.com
+        caBundleRef:
+          name: vendor-ca


This is not related, but i wonder where is the caBundle located, just a name is not enough

Would we be using ClusterTrustBundle (which is cluster-scoped, so name alone could be sufficient) as the default kind here? kubernetes/enhancements#3257

Yeah CTB seems reasonable. We'd use the same underlying type as BackendTLS policy I think

proposals/10-egress-gateways.md

hzxuzhonghu · 2025-12-04T06:47:27Z

proposals/10-egress-gateways.md

+  // Enable TLS with simple server certificate verification.
+  BackendTLSModeSimple BackendTLSMode = "Simple"
+  // Enable mutual TLS.
+  BackendTLSModeMutual BackendTLSMode = "Mutual"


Not seeing BackendTLS support setting client key/certs

I'm not entirely clear on where client config for mTLS should live, but expect that should be related to discussions in kubernetes-sigs/gateway-api#4192 and/or kubernetes-sigs/gateway-api#3876

/cc @howardjohn

@hzxuzhonghu There's a ClientCertificateRef of BackendTLS.

@mikemorris I talked about this in my comment to Rob above, but I feel like it really needs to be on the backend and not the Gateway. Multi-tenancy is the main use-case I can think of but also for mesh/manual mTLS, you can't rely on there being a Gateway. The semantic of this field is "Configure TLS in this way when talking to this backend"; it's a client-side policy.

hzxuzhonghu · 2025-12-04T06:48:30Z

proposals/10-egress-gateways.md

+  // Enable mutual TLS.
+  BackendTLSModeMutual BackendTLSMode = "Mutual"
+  // Don't terminate TLS, use SNI to route.
+  BackendTLSModePassthrough BackendTLSMode = "Passthrough"


isn't this set at gateway listener?

I don't think a Backend alone should be able to terminate TLS (not sure if that's implied by this API or not), doing that should require routing through a Gateway such as discussed in #20 (comment).

Yeah agreed; the gateway is doing the work. The purpose of this enum field was more to indicate that there is some TLS being done in the system (so the user doesn't have to set InsecureDisable). Maybe it should be something like "GatewayPassthrough" instead?

robscott

Thanks @keithmattix!

robscott · 2025-12-04T17:17:40Z

proposals/10-egress-gateways.md

+  // Would implementations have to define a schema for their extensions (even if they aren't CRDs)?
+  // Maybe that's a good thing?
+  Config any `json:"config,omitempty"`


I don't think this works in a k8s API?

Yeah I didn't think about this too hard; I figure this is some interface like runtime.Object or unstructured.Unstructured or maybe even just a YAML struct of some sort? I'm just wanting to signal that the user could put anything here and the controller will marshal it on the back end

robscott · 2025-12-04T17:20:09Z

proposals/10-egress-gateways.md

+  Destination BackendDestination `json:"destination"`
+  // extensions defines optional extension processors that can be applied to this backend.
+  // +optional
+  Extensions []BackendExtension `json:"extensions,omitempty"`


Do we have a clear set of examples for how these extensions would be used? If not, can we omit them until we do?

One example from above is CredentialInjector - these could look quite similar to the HTTPRouteFilter filters field on HTTPRoute backendRefs, with the added ability to specify ordering (which we've discussed in relation to filters previously, including potential difficulty for some implementations with predefined ordering).

Yeah credentialInjector is the main example from above. Prompt guarding and other things like that would be in scope as well. Are you looking for more details?

robscott · 2025-12-04T17:23:37Z

proposals/10-egress-gateways.md

+  Protocol BackendProtocol `json:"protocol"`
+  // TLS defines the TLS configuration for the backend.
+  // +optional
+  TLS *BackendTLS `json:"tls,omitempty"`


I think a decision to choose anything other than BackendTLSPolicy here requires significantly more discussion + detail in this proposal. If the goal is just to copy + inline BackendTLSPolicy types, that might make sense, but there are other benefits of a policy here, such as the ability to reuse config across different backends.

mikemorris · 2025-12-05T21:57:26Z

I'm not sure if this meeting was recorded, but notes are available - in addition to the discussions we've had about this in WG AI Gateway, there was a discussion relevant to this scope during the October 22nd kube-agentic-networking meeting https://docs.google.com/document/d/1EQET_VWe_IAINyQhVj-wduZg99gBaObpz9612eZ1iYg/edit?tab=t.0#heading=h.e9nns9g5k28v

mikemorris

The macro critique I have of this proposal is that I feel like it conflates frontend and backend facets in the same way that Service and Istio's ServiceEntry do.

I'd prefer to split this into 3-4 resources - a namespace-scoped Frontend (name TBD, could be separate type for or inclusive of external FQDN and Cluster DNS name, maybe Service actually fills this role for now if we can make a case this is progress toward decomposing it described as in https://www.youtube.com/watch?v=Oslwx3hj2Eg), a cluster-scoped Frontend, the namespace-scoped Backend, and possibly a separate FQDN (probably cluster-scoped?) resource (TBD on how to avoid loops here with Frontend and Backend both wanting an FQDN reference).

Within the Backend resource, I'd like to see as destinations (maybe even the backendRefs specifically if these are all external resources?):

FQDN
IPAddress
Pod Selector (not Service)

While the inline BackendTLS config makes sense when the FQDN destination is inlined, if FQDN becomes an independent resource it could be more straightforward to target it with BackendTLSPolicy.

Signed-off-by: Keith Mattix II <[email protected]>

keithmattix · 2025-12-08T23:31:46Z

@mikemorris Thanks for the feedback! I think we're definitely aligned that this resource should only be the "Backend" facet of Service; not the frontend. While I agree with the general vision of a Frontend resource, I don't necessarily want to couple that with this proposal. There are a lot of ecosystem pieces that make getting there challenging: changes to CoreDNS, allocation of VIPs, standardizing how clients talk to external fqdns, etc. Those last two especially would take a KEP to get done and that's not the timeframe we want to work under here.

I'm also hesitant about extracting FQDN and PodSelector into their own CRDs. I worry that users will start using them for way more use-cases than we'd expect, and, on top of that, doing PodSelector instead of service would require changes to the endpointslice controller (another KEP). So while I think I wouldn't mind the vision that your pitching, I think there are a lot of time-intensive barriers that make it infeasible for this WG's goals

First pass at a schema for Backend plus musings on scope

a5c999c

Sketch out some Go types and add a section on whether Backend is namespace or cluster-scoped. Signed-off-by: Keith Mattix II <[email protected]>

k8s-ci-robot requested review from Xunzhuo and kfswain November 30, 2025 22:37

k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Nov 30, 2025

keithmattix commented Nov 30, 2025

View reviewed changes

howardjohn reviewed Dec 1, 2025

View reviewed changes

Respond to PR feedback

76a72c8

Signed-off-by: Keith Mattix II <[email protected]>

usize approved these changes Dec 2, 2025

View reviewed changes

keithmattix commented Dec 2, 2025

View reviewed changes

david-martin reviewed Dec 3, 2025

View reviewed changes

hzxuzhonghu reviewed Dec 4, 2025

View reviewed changes

robscott reviewed Dec 4, 2025

View reviewed changes

LiorLieberman mentioned this pull request Dec 4, 2025

Add Go API types (AccessPolicy and Backend) kubernetes-sigs/kube-agentic-networking#20

Merged

k8s-ci-robot requested a review from howardjohn December 5, 2025 22:36

mikemorris suggested changes Dec 5, 2025

View reviewed changes

david-martin mentioned this pull request Dec 8, 2025

Usability concerns kagenti/mcp-gateway#421

Open

Address PR feedback

470340b

Signed-off-by: Keith Mattix II <[email protected]>


		Examples in this document are illustrative only.

		#### Scope and Persona Ownership

First pass at a schema for Backend plus musings on scope #20

Are you sure you want to change the base?

First pass at a schema for Backend plus musings on scope #20

Uh oh!

Conversation

keithmattix commented Nov 30, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

usize Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mikemorris Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Override

"Last hop"

Uh oh!

keithmattix Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

keithmattix Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

usize Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

k8s-ci-robot commented Dec 2, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

First pass at a schema for `Backend` plus musings on scope #20

First pass at a schema for `Backend` plus musings on scope #20

usize Dec 2, 2025 •

edited

Loading

mikemorris Dec 5, 2025 •

edited

Loading

keithmattix Dec 8, 2025 •

edited

Loading

keithmattix Dec 8, 2025 •

edited

Loading

usize Dec 2, 2025 •

edited

Loading

mikemorris Dec 5, 2025 •

edited

Loading

mikemorris Dec 5, 2025 •

edited

Loading