-
Notifications
You must be signed in to change notification settings - Fork 140
v1.0 InferencePool API Review #1173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release-0.5
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,283 @@ | ||
/* | ||
Copyright 2025 The Kubernetes Authors. | ||
|
||
Licensed under the Apache License, Version 2.0 (the "License"); | ||
you may not use this file except in compliance with the License. | ||
You may obtain a copy of the License at | ||
|
||
http://www.apache.org/licenses/LICENSE-2.0 | ||
|
||
Unless required by applicable law or agreed to in writing, software | ||
distributed under the License is distributed on an "AS IS" BASIS, | ||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
See the License for the specific language governing permissions and | ||
limitations under the License. | ||
*/ | ||
|
||
package v1 | ||
|
||
import ( | ||
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" | ||
) | ||
|
||
// InferencePool is the Schema for the InferencePools API. | ||
// | ||
// +kubebuilder:object:root=true | ||
// TODO: change the annotation once it gets officially approved | ||
// +kubebuilder:metadata:annotations="api-approved.kubernetes.io=unapproved, experimental-only" | ||
// +kubebuilder:subresource:status | ||
// +kubebuilder:storageversion | ||
// +genclient | ||
type InferencePool struct { | ||
metav1.TypeMeta `json:",inline"` | ||
|
||
// +optional | ||
metav1.ObjectMeta `json:"metadata,omitempty"` | ||
|
||
// +required | ||
Spec InferencePoolSpec `json:"spec,omitzero"` | ||
|
||
// Status defines the observed state of InferencePool. | ||
// | ||
// +kubebuilder:default={parent: {{parentRef: {kind: "Status", name: "default"}, conditions: {{type: "Accepted", status: "Unknown", reason: "Pending", message: "Waiting for controller", lastTransitionTime: "1970-01-01T00:00:00Z"}}}}} | ||
// +optional | ||
Status InferencePoolStatus `json:"status,omitzero"` | ||
} | ||
|
||
// InferencePoolList contains a list of InferencePool. | ||
// | ||
// +kubebuilder:object:root=true | ||
type InferencePoolList struct { | ||
metav1.TypeMeta `json:",inline"` | ||
metav1.ListMeta `json:"metadata,omitempty"` | ||
Items []InferencePool `json:"items"` | ||
} | ||
|
||
// InferencePoolSpec defines the desired state of InferencePool | ||
type InferencePoolSpec struct { | ||
// Selector determines which Pods are members of this inference pool. | ||
// It matches Pods by their labels only within the same namespace; cross-namespace | ||
// selection is not supported. | ||
// | ||
// The structure of this LabelSelector is intentionally simple to be compatible | ||
// with Kubernetes Service selectors, as some implementations may translate | ||
// this configuration into a Service resource. | ||
// | ||
// +required | ||
Selector LabelSelector `json:"selector,omitempty,omitzero"` | ||
|
||
// +kubebuilder:validation:MinItems=1 | ||
// +kubebuilder:validation:MaxItems=1 | ||
// +listType=map | ||
// +listMapKey=number | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This doesn't make much sense to me. If the key is "number", then you can never have named ports - are you OK with that? If you have number and not named ports, why have a struct? Will you ever have more fields in there? Or could you reduce it to a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we're ok without named ports, but we may want to allow users to configure some per-port metadata in the future - see #1336 for one of the use cases we're working towards. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we will have more fields here, we have identified requirements that could lead to such mapping, but the requirements are insufficiently refined to be certain number alone is sufficient. Re: named ports, not supporting named ports is a bit wierd in a Kube sense. I am a bit concerned that we are inconsistent with services and pods here if we tie to listMapKey=number. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How comon is it to reference a service port by name vs. a service referencing a pod port by name (which I understand better). listmap keys are pretty hard to fix -- does this really need to be a map? Can it just be atomic? |
||
// +required | ||
TargetPorts []Port `json:"targetPorts,omitempty"` | ||
|
||
// Extension configures an endpoint picker as an extension service. | ||
// +required | ||
ExtensionRef Extension `json:"extensionRef,omitempty,omitzero"` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is this field not named There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's a great question. IMO it was useful to leave some level of flexibility in case the component currently known as EndpointPicker evolves to do more things along the way. It's honestly difficult to predict which of the following will be more useful long term:
Of those, 2 seems the least likely to be useful. 3 seems plausible, but since we don't have a clear use case for it yet, I lean towards avoiding this to start. Although it's painful, we do have plenty of prior art for transitioning a singular field to a list in k8s APIs if we need to. Open to other thoughts here: |
||
} | ||
|
||
type Port struct { | ||
// Number defines the port number to access the selected model server Pods. | ||
// The number must be in the range 1 to 65535. | ||
// | ||
// +required | ||
Number PortNumber `json:"number,omitempty"` | ||
} | ||
|
||
// Extension specifies how to configure an extension that runs the endpoint picker. | ||
type Extension struct { | ||
// Group is the group of the referent. | ||
// The default value is "", representing the Core API group. | ||
// | ||
// +optional | ||
// +kubebuilder:default="" | ||
Group *Group `json:"group,omitempty"` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what types in the core API group would be valid to point to for this? Is defaulting to "" going to bite people who forget about API group and think just their kind is sufficient to specify some non-core type? Required with no default to require specifying the group and kind together seems more likely to avoid accidents. Similar comments apply on the kind field. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We expect >95% of references to be to Service long term, so we wanted that experience to be as simple as possible. Currently Service is the only kind supported by Gateway implementations, but leaving this here allows implementations to support custom kinds as needed. This is copying what we did in Gateway API for backendRefs in Routes. All Gateways support references to Service, and that represents >90% of use. Some implementations have expanded that to support ServiceImport, and now InferencePool. So far I haven't seen or heard of any issues where a user set kind but failed to set group. |
||
|
||
// Kind is the Kubernetes resource kind of the referent. | ||
// | ||
// Defaults to "Service" when not specified. | ||
// | ||
// ExternalName services can refer to CNAME DNS records that may live | ||
// outside of the cluster and as such are difficult to reason about in | ||
// terms of conformance. They also may not be safe to forward to (see | ||
// CVE-2021-25740 for more information). Implementations MUST NOT | ||
// support ExternalName Services. | ||
// | ||
// +optional | ||
// +kubebuilder:default=Service | ||
Kind Kind `json:"kind,omitempty"` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Given an implementation which consumes this, is extension a priori knowable by that implementation, or no? IOW, If I say There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, a Gateway would have to predefine all the types of targets it supports. I would not expect a Gateway implementation to be able to handle arbitrary types here, it would need some form of advanced knowledge of how to support each type. I think this matches the Ingress use case defined in https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#kind-vs-resource. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK to resolve |
||
|
||
// Name is the name of the referent. | ||
// | ||
// +required | ||
Name ObjectName `json:"name,omitempty"` | ||
|
||
// The port number on the service running the extension. When unspecified, | ||
// implementations SHOULD infer a default value of 9002 when the Kind is | ||
// Service. | ||
// | ||
// +optional | ||
PortNumber PortNumber `json:"portNumber,omitempty"` | ||
|
||
// Configures how the gateway handles the case when the extension is not responsive. | ||
// Defaults to failClose. | ||
// | ||
// +optional | ||
// +kubebuilder:default="FailClose" | ||
FailureMode ExtensionFailureMode `json:"failureMode,omitempty"` | ||
} | ||
|
||
// ExtensionFailureMode defines the options for how the gateway handles the case when the extension is not | ||
// responsive. | ||
// +kubebuilder:validation:Enum=FailOpen;FailClose | ||
type ExtensionFailureMode string | ||
|
||
const ( | ||
// FailOpen specifies that the proxy should forward the request to an endpoint of its picking when the Endpoint Picker fails. | ||
FailOpen ExtensionFailureMode = "FailOpen" | ||
// FailClose specifies that the proxy should drop the request when the Endpoint Picker fails. | ||
FailClose ExtensionFailureMode = "FailClose" | ||
) | ||
|
||
// InferencePoolStatus defines the observed state of InferencePool. | ||
// +kubebuilder:validation:MinProperties=1 | ||
type InferencePoolStatus struct { | ||
// Parents is a list of parent resources (usually Gateways) that are | ||
// associated with the InferencePool, and the status of the InferencePool with respect to | ||
// each parent. | ||
// | ||
// A maximum of 32 Gateways will be represented in this list. When the list contains | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I do not understand the comment about "kind: Status, name: default" -- where are those fields? They are not in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I also don't understand why you need a canary entry to signal "nothing to see" -- why is an empty list not sufficient? I find the imposition on the controllers to be very awkward. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This was copied from Gateway API. Generally k8s users don't seem to treat the absence of status as a bad thing, so they may not realize their InferencePool is not being used/referenced. This is particularly useful in Gateways where you made a typo on A similar pattern is useful for Routes if you make a typo referring to a Gateway - there's no controller empowered to warn you of this since each Gateway controller only populates status on Routes that are attached to their Gateway. If there's a reference to a nonexistent Gateway, there's nothing to warn you about that other than some kind of default status. With that said, this doesn't seem quite as useful on InferencePool. If someone makes a typo when pointing a route to an InferencePool, they'll get a warning in the status of that Route. This status can be useful in indicating that the InferencePool has not actually been implemented yet, but an empty list may suffice here. cc @danehans |
||
// `kind: Status, name: default`, it indicates that the InferencePool is not | ||
// associated with any Gateway and a controller must perform the following: | ||
// | ||
// - Remove the parent when setting the "Accepted" condition. | ||
// - Add the parent when the controller will no longer manage the InferencePool | ||
// and no other parents exist. | ||
// | ||
// +kubebuilder:validation:MaxItems=32 | ||
// +optional | ||
// +listType=atomic | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there only ever one controller which owns this? Or could I have 2 controller implementations which reference the same There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point - this is meant to be populated by different controllers. Within this, each list item has a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @robscott not sure what I should answer here, but here's my attempt to: the listType=atomic was added as part of https://github.com/kubernetes-sigs/gateway-api-inference-extension/pull/1366/files#diff-9937ff930689bcfa552fc7430c5242642654dc06325481ad4c3ef6b5737e218bR154 to satisfy the linter in a way it wouldn't also cause a breaking change, but that said, if we feel that the listType should be a On Gateway API we decided not to change this as it would be a breaking change (and in fact this was fixed very recently: kubernetes-sigs/gateway-api@78496d8) but I guess for GIE this is arguable. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jpbetz @yongruilin wrt uniqueness and list semantics If everyone does RMW and uses resourceVersion precondition, shared ownership can work. But that's asking a lot. |
||
Parents []PoolStatus `json:"parent,omitempty"` | ||
} | ||
|
||
// PoolStatus defines the observed state of InferencePool from a Gateway. | ||
type PoolStatus struct { | ||
// Conditions track the state of the InferencePool. | ||
// | ||
// Known condition types are: | ||
// | ||
// * "Accepted" | ||
// * "ResolvedRefs" | ||
// | ||
// +optional | ||
// +listType=map | ||
// +listMapKey=type | ||
// +kubebuilder:validation:MaxItems=8 | ||
// +kubebuilder:default={{type: "Accepted", status: "Unknown", reason:"Pending", message:"Waiting for controller", lastTransitionTime: "1970-01-01T00:00:00Z"}} | ||
Conditions []metav1.Condition `json:"conditions,omitempty"` | ||
|
||
// GatewayRef indicates the gateway that observed state of InferencePool. | ||
// +required | ||
GatewayRef ParentGatewayReference `json:"parentRef,omitzero"` | ||
} | ||
|
||
// InferencePoolConditionType is a type of condition for the InferencePool | ||
type InferencePoolConditionType string | ||
|
||
// InferencePoolReason is the reason for a given InferencePoolConditionType | ||
type InferencePoolReason string | ||
|
||
const ( | ||
// This condition indicates whether the InferencePool has been accepted or rejected | ||
// by a Gateway, and why. | ||
// | ||
// Possible reasons for this condition to be True are: | ||
// | ||
// * "Accepted" | ||
// | ||
// Possible reasons for this condition to be False are: | ||
// | ||
// * "NotSupportedByGateway" | ||
// * "HTTPRouteNotAccepted" | ||
// | ||
// Possible reasons for this condition to be Unknown are: | ||
// | ||
// * "Pending" | ||
// | ||
// Controllers MAY raise this condition with other reasons, but should | ||
// prefer to use the reasons listed above to improve interoperability. | ||
InferencePoolConditionAccepted InferencePoolConditionType = "Accepted" | ||
|
||
// This reason is used with the "Accepted" condition when the InferencePool has been | ||
// accepted by the Gateway. | ||
InferencePoolReasonAccepted InferencePoolReason = "Accepted" | ||
|
||
// This reason is used with the "Accepted" condition when the InferencePool | ||
// has not been accepted by a Gateway because the Gateway does not support | ||
// InferencePool as a backend. | ||
InferencePoolReasonNotSupportedByGateway InferencePoolReason = "NotSupportedByGateway" | ||
|
||
// This reason is used with the "Accepted" condition when the InferencePool is | ||
// referenced by an HTTPRoute that has been rejected by the Gateway. The user | ||
// should inspect the status of the referring HTTPRoute for the specific reason. | ||
InferencePoolReasonHTTPRouteNotAccepted InferencePoolReason = "HTTPRouteNotAccepted" | ||
|
||
// This reason is used with the "Accepted" when a controller has not yet | ||
// reconciled the InferencePool. | ||
InferencePoolReasonPending InferencePoolReason = "Pending" | ||
) | ||
|
||
const ( | ||
// This condition indicates whether the controller was able to resolve all | ||
// the object references for the InferencePool. | ||
// | ||
// Possible reasons for this condition to be True are: | ||
// | ||
// * "ResolvedRefs" | ||
// | ||
// Possible reasons for this condition to be False are: | ||
// | ||
// * "InvalidExtensionRef" | ||
// | ||
// Controllers MAY raise this condition with other reasons, but should | ||
// prefer to use the reasons listed above to improve interoperability. | ||
InferencePoolConditionResolvedRefs InferencePoolConditionType = "ResolvedRefs" | ||
|
||
// This reason is used with the "ResolvedRefs" condition when the condition | ||
// is true. | ||
InferencePoolReasonResolvedRefs InferencePoolReason = "ResolvedRefs" | ||
|
||
// This reason is used with the "ResolvedRefs" condition when the | ||
// Extension is invalid in some way. This can include an unsupported kind | ||
// or API group, or a reference to a resource that can not be found. | ||
InferencePoolReasonInvalidExtensionRef InferencePoolReason = "InvalidExtensionRef" | ||
) | ||
|
||
// ParentGatewayReference identifies an API object including its namespace, | ||
// defaulting to Gateway. | ||
type ParentGatewayReference struct { | ||
// Group is the group of the referent. | ||
// | ||
// +optional | ||
// +kubebuilder:default="gateway.networking.k8s.io" | ||
Group *Group `json:"group,omitempty"` | ||
|
||
// Kind is kind of the referent. For example "Gateway". | ||
// | ||
// +optional | ||
// +kubebuilder:default=Gateway | ||
Kind Kind `json:"kind,omitempty"` | ||
|
||
// Name is the name of the referent. | ||
// +required | ||
Name ObjectName `json:"name,omitempty"` | ||
|
||
// Namespace is the namespace of the referent. If not present, | ||
// the namespace of the referent is assumed to be the same as | ||
// the namespace of the referring object. | ||
// | ||
// +optional | ||
Namespace Namespace `json:"namespace,omitempty"` | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optional non-pointer structs are kind of weird - it basically means the same thing as having all the member fields be optional. For builtin types, declarative validation wants to not even allow it because "optional" implies "not serialized", but structs are always serialized, even if empty (but not in CRD).
@jpbetz @JoelSpeed @liggitt do we want different rules for the use of
optional
andrequired
between CRDs and builtins? Or should we just say that non-pointer structs are neither optional nor required, they just ARE? Or should we allow it and maybe even enforce it (you can't have a struct which has a required field in a field marked optional)?Same for other non-pointer structs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that's true with omitzero, especially within CRD world where the validation of optional or required is handled by openapi validation, so any Go client reading the object can assert that an empty struct means it was omitted on the wire during the write.
When the struct would pass
InferencePool.Status == InferencePoolStatus{}
this will be omitted from the serialized representation, which would mean that any validations within it (e.g. required fields) wouldn't trigger, which would be the same as a nil pointer to a struct in terms of serialization.For a structured client, this means they initialise the parent object but don't set any field within the struct to something that's non zero, they haven't made a choice about the struct, or any of its member fields, it still sounds optional to me?
I'm also not following why there's a distinction between CRD and built-ins for your comment, omitzero should work just find for built-in types too. When @liggitt and I discussed this previously, we concluded that optional fields should remain pointers and omitempty in built-ins but only because it's hard in the validation code to then know whether or not the client sent the empty value, or left the field unset (and possibly an issue with proto and adding new fields?)
Cluster API for example has switched over to
omitzero
non pointer for all structs where{}
isn't a valid user choice, and as far as we can tell, everything there is behaving as we would expect (cc @sbueringer in case you want to add anything about this experience)The only caveat I guess is that structured clients, with this approach cannot even attempt to send
{}
over the wire, but that's not a valid value anyway so do we care about not letting them send an invalid value, when the trade off is that it makes coding against the API easier and reduces the likelihood of nil pointer exceptions in both our, and their code?Counter example that would disprove this would be in a discriminated union. The union members are optional (though required depending on discriminator) but it's totally valid to have a required value within a union member struct. "If type is Foo, you must set
foo
, whenfoo
is set, we require an opinion onfoo.bar
"I can find an example of this working with
omitzero
in OpenShift if you wanted to see a real world CRD example.Within KAL, the serialization implementation that we use for checking optional and required fields has ended up being the same code, but with slightly different configs.
For required fields, it recommends for
required
fields, to only use a*T
when the zero value of the object is a valid user choice (e.g.0
for a required replicas field,*int32
). Otherwise, it recommends justT
. It always recommendsomitempty
(andomitzero
on structs unless the zero value is valid).For optional fields, it always recommends
*T
andomitempty
, as this is what Jordan and I aligned on as mentioned above.(Exception to the above are slices and maps where it does not recommend
*T
because they already have a built-in nil)However, while the above configurations are correct (per my understanding) for built-ins, the serialization behaviour of the required fields implementation also works for optional fields in custom resources because of the way validation occurs differently (structured vs unstructured), so the
optionalfields
implementation also has a configuration optionpreference: WhenRequired
that will allow you to switch its recommendations to match what I described forrequiredfields
.It's this config that has been adopted by CAPI and GW API.
I agree having differences could be confusing, and structs are the hardest to reason about, but I'm a firm believer in reducing the use of pointers in our APIs unless they are actually necessary, and that this will have a benefit to the community in the long term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a struct is marked optional, but has a required field, validation SHOULD fail, right?
AFAICT we don't use omitzero for builtins? And that only applies to JSON encoding, right? In proto, an empty struct would still have its tag number serialized, I think? A quick test shows it to be true:
yields
The 0x4a is field Foo.Bar's tag (9) shifted left by 3 + what I presume is an ID of the data type.
Union members should ALWAYS be pointers. My point was only about optional non-pointers. This doesn't makes sense to me:
I'm not actually arguing for a pointer here -- I know we have a lot of structs that are marked optional already. I'm arguing that they are all wrong and we should decide what we think is correct, and start trending in the right direction.
Declarative validation today (only applicable to builtins for now) simply rejects this construction. Is that correct? I am not sure.