Skip to content

v1.0 InferencePool API Review #1173

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: release-0.5
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
283 changes: 283 additions & 0 deletions apix/v1/inferencepool_types.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,283 @@
/*
Copyright 2025 The Kubernetes Authors.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package v1

import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

// InferencePool is the Schema for the InferencePools API.
//
// +kubebuilder:object:root=true
// TODO: change the annotation once it gets officially approved
// +kubebuilder:metadata:annotations="api-approved.kubernetes.io=unapproved, experimental-only"
// +kubebuilder:subresource:status
// +kubebuilder:storageversion
// +genclient
type InferencePool struct {
metav1.TypeMeta `json:",inline"`

// +optional
metav1.ObjectMeta `json:"metadata,omitempty"`

// +required
Spec InferencePoolSpec `json:"spec,omitzero"`

// Status defines the observed state of InferencePool.
//
// +kubebuilder:default={parent: {{parentRef: {kind: "Status", name: "default"}, conditions: {{type: "Accepted", status: "Unknown", reason: "Pending", message: "Waiting for controller", lastTransitionTime: "1970-01-01T00:00:00Z"}}}}}
// +optional
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional non-pointer structs are kind of weird - it basically means the same thing as having all the member fields be optional. For builtin types, declarative validation wants to not even allow it because "optional" implies "not serialized", but structs are always serialized, even if empty (but not in CRD).

@jpbetz @JoelSpeed @liggitt do we want different rules for the use of optional and required between CRDs and builtins? Or should we just say that non-pointer structs are neither optional nor required, they just ARE? Or should we allow it and maybe even enforce it (you can't have a struct which has a required field in a field marked optional)?

Same for other non-pointer structs.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional non-pointer structs are kind of weird - it basically means the same thing as having all the member fields be optional.

I don't think that's true with omitzero, especially within CRD world where the validation of optional or required is handled by openapi validation, so any Go client reading the object can assert that an empty struct means it was omitted on the wire during the write.

When the struct would pass InferencePool.Status == InferencePoolStatus{} this will be omitted from the serialized representation, which would mean that any validations within it (e.g. required fields) wouldn't trigger, which would be the same as a nil pointer to a struct in terms of serialization.

For a structured client, this means they initialise the parent object but don't set any field within the struct to something that's non zero, they haven't made a choice about the struct, or any of its member fields, it still sounds optional to me?

I'm also not following why there's a distinction between CRD and built-ins for your comment, omitzero should work just find for built-in types too. When @liggitt and I discussed this previously, we concluded that optional fields should remain pointers and omitempty in built-ins but only because it's hard in the validation code to then know whether or not the client sent the empty value, or left the field unset (and possibly an issue with proto and adding new fields?)

Cluster API for example has switched over to omitzero non pointer for all structs where {} isn't a valid user choice, and as far as we can tell, everything there is behaving as we would expect (cc @sbueringer in case you want to add anything about this experience)

The only caveat I guess is that structured clients, with this approach cannot even attempt to send {} over the wire, but that's not a valid value anyway so do we care about not letting them send an invalid value, when the trade off is that it makes coding against the API easier and reduces the likelihood of nil pointer exceptions in both our, and their code?

you can't have a struct which has a required field in a field marked optional

Counter example that would disprove this would be in a discriminated union. The union members are optional (though required depending on discriminator) but it's totally valid to have a required value within a union member struct. "If type is Foo, you must set foo, when foo is set, we require an opinion on foo.bar"

I can find an example of this working with omitzero in OpenShift if you wanted to see a real world CRD example.

do we want different rules for the use of optional and required between CRDs and builtins?

Within KAL, the serialization implementation that we use for checking optional and required fields has ended up being the same code, but with slightly different configs.

For required fields, it recommends for required fields, to only use a *T when the zero value of the object is a valid user choice (e.g. 0 for a required replicas field, *int32). Otherwise, it recommends just T. It always recommends omitempty (and omitzero on structs unless the zero value is valid).

For optional fields, it always recommends *T and omitempty, as this is what Jordan and I aligned on as mentioned above.

(Exception to the above are slices and maps where it does not recommend *T because they already have a built-in nil)

However, while the above configurations are correct (per my understanding) for built-ins, the serialization behaviour of the required fields implementation also works for optional fields in custom resources because of the way validation occurs differently (structured vs unstructured), so the optionalfields implementation also has a configuration option preference: WhenRequired that will allow you to switch its recommendations to match what I described for requiredfields.

It's this config that has been adopted by CAPI and GW API.

I agree having differences could be confusing, and structs are the hardest to reason about, but I'm a firm believer in reducing the use of pointers in our APIs unless they are actually necessary, and that this will have a benefit to the community in the long term.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a structured client, this means they initialise the parent object but don't set any field within the struct to something that's non zero, they haven't made a choice about the struct, or any of its member fields, it still sounds optional to me?

If a struct is marked optional, but has a required field, validation SHOULD fail, right?

I'm also not following why there's a distinction between CRD and built-ins for your comment, omitzero should work just find for built-in types too.

AFAICT we don't use omitzero for builtins? And that only applies to JSON encoding, right? In proto, an empty struct would still have its tag number serialized, I think? A quick test shows it to be true:

+type Foo struct {
+   metav1.TypeMeta `json:",inline"`
+
+   // +optional
+   Bar Bar `protobuf:"bytes,9,opt,name=bar"`
+}
+
+type Bar struct {
+   // +optional
+   Baz *string `json:"baz,omitempty" protobuf:"bytes,7,opt,name=baz"`
+}

yields

+func (m *Foo) MarshalToSizedBuffer(dAtA []byte) (int, error) {
+   i := len(dAtA)
+   _ = i
+   var l int
+   _ = l
+   {
+       size, err := m.Bar.MarshalToSizedBuffer(dAtA[:i])
+       if err != nil {
+           return 0, err
+       }
+       i -= size
+       i = encodeVarintGenerated(dAtA, i, uint64(size))
+   }
+   i--
+   dAtA[i] = 0x4a
+   return len(dAtA) - i, nil
+}

The 0x4a is field Foo.Bar's tag (9) shifted left by 3 + what I presume is an ID of the data type.

Counter example that would disprove this would be in a discriminated union. The union members are optional (though required depending on discriminator) but it's totally valid to have a required value within a union member struct. "If type is Foo, you must set foo, when foo is set, we require an opinion on foo.bar"

Union members should ALWAYS be pointers. My point was only about optional non-pointers. This doesn't makes sense to me:

type Foo struct {
   metav1.TypeMeta `json:",inline"`

   // +optional
   Bar Bar `protobuf:"bytes,9,opt,name=bar"`
}

type Bar struct {
   // +required
   Baz string `json:"baz,omitempty" protobuf:"bytes,7,opt,name=baz"`
}

I agree having differences could be confusing, and structs are the hardest to reason about, but I'm a firm believer in reducing the use of pointers in our APIs unless they are actually necessary, and that this will have a benefit to the community in the long term.

I'm not actually arguing for a pointer here -- I know we have a lot of structs that are marked optional already. I'm arguing that they are all wrong and we should decide what we think is correct, and start trending in the right direction.

Declarative validation today (only applicable to builtins for now) simply rejects this construction. Is that correct? I am not sure.

Status InferencePoolStatus `json:"status,omitzero"`
}

// InferencePoolList contains a list of InferencePool.
//
// +kubebuilder:object:root=true
type InferencePoolList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:"metadata,omitempty"`
Items []InferencePool `json:"items"`
}

// InferencePoolSpec defines the desired state of InferencePool
type InferencePoolSpec struct {
// Selector determines which Pods are members of this inference pool.
// It matches Pods by their labels only within the same namespace; cross-namespace
// selection is not supported.
//
// The structure of this LabelSelector is intentionally simple to be compatible
// with Kubernetes Service selectors, as some implementations may translate
// this configuration into a Service resource.
//
// +required
Selector LabelSelector `json:"selector,omitempty,omitzero"`

// +kubebuilder:validation:MinItems=1
// +kubebuilder:validation:MaxItems=1
// +listType=map
// +listMapKey=number
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't make much sense to me.

If the key is "number", then you can never have named ports - are you OK with that?

If you have number and not named ports, why have a struct? Will you ever have more fields in there? Or could you reduce it to a listType=Set and call it []PortNumber?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're ok without named ports, but we may want to allow users to configure some per-port metadata in the future - see #1336 for one of the use cases we're working towards.

cc @smarterclayton

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will have more fields here, we have identified requirements that could lead to such mapping, but the requirements are insufficiently refined to be certain number alone is sufficient.

Re: named ports, not supporting named ports is a bit wierd in a Kube sense. I am a bit concerned that we are inconsistent with services and pods here if we tie to listMapKey=number.

Copy link

@thockin thockin Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How comon is it to reference a service port by name vs. a service referencing a pod port by name (which I understand better).

listmap keys are pretty hard to fix -- does this really need to be a map? Can it just be atomic?

// +required
TargetPorts []Port `json:"targetPorts,omitempty"`

// Extension configures an endpoint picker as an extension service.
// +required
ExtensionRef Extension `json:"extensionRef,omitempty,omitzero"`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this field not named EndpointPicker ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a great question. IMO it was useful to leave some level of flexibility in case the component currently known as EndpointPicker evolves to do more things along the way. It's honestly difficult to predict which of the following will be more useful long term:

  1. A single extensionRef where that extension does ~everything
  2. Separate extensionRefs with different field names (prefillExtRef, decodeExtRef, etc)
  3. An ordered list of extensionRefs each having a type

Of those, 2 seems the least likely to be useful. 3 seems plausible, but since we don't have a clear use case for it yet, I lean towards avoiding this to start. Although it's painful, we do have plenty of prior art for transitioning a singular field to a list in k8s APIs if we need to.

Open to other thoughts here:

cc @ahg-g @danehans @nirrozenbaum

}

type Port struct {
// Number defines the port number to access the selected model server Pods.
// The number must be in the range 1 to 65535.
//
// +required
Number PortNumber `json:"number,omitempty"`
}

// Extension specifies how to configure an extension that runs the endpoint picker.
type Extension struct {
// Group is the group of the referent.
// The default value is "", representing the Core API group.
//
// +optional
// +kubebuilder:default=""
Group *Group `json:"group,omitempty"`
Copy link

@liggitt liggitt Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what types in the core API group would be valid to point to for this? Is defaulting to "" going to bite people who forget about API group and think just their kind is sufficient to specify some non-core type? Required with no default to require specifying the group and kind together seems more likely to avoid accidents. Similar comments apply on the kind field.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We expect >95% of references to be to Service long term, so we wanted that experience to be as simple as possible. Currently Service is the only kind supported by Gateway implementations, but leaving this here allows implementations to support custom kinds as needed.

This is copying what we did in Gateway API for backendRefs in Routes. All Gateways support references to Service, and that represents >90% of use. Some implementations have expanded that to support ServiceImport, and now InferencePool. So far I haven't seen or heard of any issues where a user set kind but failed to set group.


// Kind is the Kubernetes resource kind of the referent.
//
// Defaults to "Service" when not specified.
//
// ExternalName services can refer to CNAME DNS records that may live
// outside of the cluster and as such are difficult to reason about in
// terms of conformance. They also may not be safe to forward to (see
// CVE-2021-25740 for more information). Implementations MUST NOT
// support ExternalName Services.
//
// +optional
// +kubebuilder:default=Service
Kind Kind `json:"kind,omitempty"`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given an implementation which consumes this, is extension a priori knowable by that implementation, or no?

IOW, If I say kind: FooBar, does the implementation know exactly what that means and which API resource path to read, or does it have to look up "FooBar" in the API discovery mechanism to find the resource path?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, a Gateway would have to predefine all the types of targets it supports. I would not expect a Gateway implementation to be able to handle arbitrary types here, it would need some form of advanced knowledge of how to support each type. I think this matches the Ingress use case defined in https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#kind-vs-resource.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK to resolve


// Name is the name of the referent.
//
// +required
Name ObjectName `json:"name,omitempty"`

// The port number on the service running the extension. When unspecified,
// implementations SHOULD infer a default value of 9002 when the Kind is
// Service.
//
// +optional
PortNumber PortNumber `json:"portNumber,omitempty"`

// Configures how the gateway handles the case when the extension is not responsive.
// Defaults to failClose.
//
// +optional
// +kubebuilder:default="FailClose"
FailureMode ExtensionFailureMode `json:"failureMode,omitempty"`
}

// ExtensionFailureMode defines the options for how the gateway handles the case when the extension is not
// responsive.
// +kubebuilder:validation:Enum=FailOpen;FailClose
type ExtensionFailureMode string

const (
// FailOpen specifies that the proxy should forward the request to an endpoint of its picking when the Endpoint Picker fails.
FailOpen ExtensionFailureMode = "FailOpen"
// FailClose specifies that the proxy should drop the request when the Endpoint Picker fails.
FailClose ExtensionFailureMode = "FailClose"
)

// InferencePoolStatus defines the observed state of InferencePool.
// +kubebuilder:validation:MinProperties=1
type InferencePoolStatus struct {
// Parents is a list of parent resources (usually Gateways) that are
// associated with the InferencePool, and the status of the InferencePool with respect to
// each parent.
//
// A maximum of 32 Gateways will be represented in this list. When the list contains
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand the comment about "kind: Status, name: default" -- where are those fields? They are not in PoolStatus.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't understand why you need a canary entry to signal "nothing to see" -- why is an empty list not sufficient? I find the imposition on the controllers to be very awkward.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was copied from Gateway API. Generally k8s users don't seem to treat the absence of status as a bad thing, so they may not realize their InferencePool is not being used/referenced.

This is particularly useful in Gateways where you made a typo on gatewayClass and nothing picks up/implements your Gateway. Having a baseline signal that a Gateway is waiting for a controller to pick it up is useful.

A similar pattern is useful for Routes if you make a typo referring to a Gateway - there's no controller empowered to warn you of this since each Gateway controller only populates status on Routes that are attached to their Gateway. If there's a reference to a nonexistent Gateway, there's nothing to warn you about that other than some kind of default status.

With that said, this doesn't seem quite as useful on InferencePool. If someone makes a typo when pointing a route to an InferencePool, they'll get a warning in the status of that Route. This status can be useful in indicating that the InferencePool has not actually been implemented yet, but an empty list may suffice here.

cc @danehans

// `kind: Status, name: default`, it indicates that the InferencePool is not
// associated with any Gateway and a controller must perform the following:
//
// - Remove the parent when setting the "Accepted" condition.
// - Add the parent when the controller will no longer manage the InferencePool
// and no other parents exist.
//
// +kubebuilder:validation:MaxItems=32
// +optional
// +listType=atomic
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there only ever one controller which owns this? Or could I have 2 controller implementations which reference the same InferencePool ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - this is meant to be populated by different controllers. Within this, each list item has a parentRef that points to a Gateway and a list of conditions, which each entry meant to be populated by the respective controller. This mirrors Route status in Gateway API (https://github.com/kubernetes-sigs/gateway-api/blob/3f2b1eee89144073a245404b6055c2d6461715dd/apis/v1/shared_types.go#L506-L509). Somehow we've gotten that to work without implementations clobbering over each other, but would be interested in if we could find a way to use a different list type here to make this more efficient.

cc @JoelSpeed @rikatz

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robscott not sure what I should answer here, but here's my attempt to: the listType=atomic was added as part of https://github.com/kubernetes-sigs/gateway-api-inference-extension/pull/1366/files#diff-9937ff930689bcfa552fc7430c5242642654dc06325481ad4c3ef6b5737e218bR154 to satisfy the linter in a way it wouldn't also cause a breaking change, but that said, if we feel that the listType should be a set (I guess this is the right type here as we have structs inside structs, and none of the fields can be considered a key) this can be changed

On Gateway API we decided not to change this as it would be a breaking change (and in fact this was fixed very recently: kubernetes-sigs/gateway-api@78496d8) but I guess for GIE this is arguable.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpbetz @yongruilin wrt uniqueness and list semantics

If everyone does RMW and uses resourceVersion precondition, shared ownership can work. But that's asking a lot.

Parents []PoolStatus `json:"parent,omitempty"`
}

// PoolStatus defines the observed state of InferencePool from a Gateway.
type PoolStatus struct {
// Conditions track the state of the InferencePool.
//
// Known condition types are:
//
// * "Accepted"
// * "ResolvedRefs"
//
// +optional
// +listType=map
// +listMapKey=type
// +kubebuilder:validation:MaxItems=8
// +kubebuilder:default={{type: "Accepted", status: "Unknown", reason:"Pending", message:"Waiting for controller", lastTransitionTime: "1970-01-01T00:00:00Z"}}
Conditions []metav1.Condition `json:"conditions,omitempty"`

// GatewayRef indicates the gateway that observed state of InferencePool.
// +required
GatewayRef ParentGatewayReference `json:"parentRef,omitzero"`
}

// InferencePoolConditionType is a type of condition for the InferencePool
type InferencePoolConditionType string

// InferencePoolReason is the reason for a given InferencePoolConditionType
type InferencePoolReason string

const (
// This condition indicates whether the InferencePool has been accepted or rejected
// by a Gateway, and why.
//
// Possible reasons for this condition to be True are:
//
// * "Accepted"
//
// Possible reasons for this condition to be False are:
//
// * "NotSupportedByGateway"
// * "HTTPRouteNotAccepted"
//
// Possible reasons for this condition to be Unknown are:
//
// * "Pending"
//
// Controllers MAY raise this condition with other reasons, but should
// prefer to use the reasons listed above to improve interoperability.
InferencePoolConditionAccepted InferencePoolConditionType = "Accepted"

// This reason is used with the "Accepted" condition when the InferencePool has been
// accepted by the Gateway.
InferencePoolReasonAccepted InferencePoolReason = "Accepted"

// This reason is used with the "Accepted" condition when the InferencePool
// has not been accepted by a Gateway because the Gateway does not support
// InferencePool as a backend.
InferencePoolReasonNotSupportedByGateway InferencePoolReason = "NotSupportedByGateway"

// This reason is used with the "Accepted" condition when the InferencePool is
// referenced by an HTTPRoute that has been rejected by the Gateway. The user
// should inspect the status of the referring HTTPRoute for the specific reason.
InferencePoolReasonHTTPRouteNotAccepted InferencePoolReason = "HTTPRouteNotAccepted"

// This reason is used with the "Accepted" when a controller has not yet
// reconciled the InferencePool.
InferencePoolReasonPending InferencePoolReason = "Pending"
)

const (
// This condition indicates whether the controller was able to resolve all
// the object references for the InferencePool.
//
// Possible reasons for this condition to be True are:
//
// * "ResolvedRefs"
//
// Possible reasons for this condition to be False are:
//
// * "InvalidExtensionRef"
//
// Controllers MAY raise this condition with other reasons, but should
// prefer to use the reasons listed above to improve interoperability.
InferencePoolConditionResolvedRefs InferencePoolConditionType = "ResolvedRefs"

// This reason is used with the "ResolvedRefs" condition when the condition
// is true.
InferencePoolReasonResolvedRefs InferencePoolReason = "ResolvedRefs"

// This reason is used with the "ResolvedRefs" condition when the
// Extension is invalid in some way. This can include an unsupported kind
// or API group, or a reference to a resource that can not be found.
InferencePoolReasonInvalidExtensionRef InferencePoolReason = "InvalidExtensionRef"
)

// ParentGatewayReference identifies an API object including its namespace,
// defaulting to Gateway.
type ParentGatewayReference struct {
// Group is the group of the referent.
//
// +optional
// +kubebuilder:default="gateway.networking.k8s.io"
Group *Group `json:"group,omitempty"`

// Kind is kind of the referent. For example "Gateway".
//
// +optional
// +kubebuilder:default=Gateway
Kind Kind `json:"kind,omitempty"`

// Name is the name of the referent.
// +required
Name ObjectName `json:"name,omitempty"`

// Namespace is the namespace of the referent. If not present,
// the namespace of the referent is assumed to be the same as
// the namespace of the referring object.
//
// +optional
Namespace Namespace `json:"namespace,omitempty"`
}
Loading