Skip to content

Releases: kubernetes-sigs/gateway-api-inference-extension

v1.0.0-rc.2

29 Aug 00:11
v1.0.0-rc.2

Choose a tag to compare

v1.0.0-rc.2 Pre-release
Pre-release

This release is primarily updating the InferencePool API and Conformance tests after the completion of the API review conducted in this PR: #1173

NOTE: Barring any breaking change after this RC the APIs are considered frozen for the remainder of the v1.0 release cycle

v1.0.0-rc.1

26 Aug 12:59
v1.0.0-rc.1

Choose a tag to compare

v1.0.0-rc.1 Pre-release
Pre-release

What's Changed

Read more

v0.5.1

23 Jul 20:04
v0.5.1

Choose a tag to compare

This patch fix is intended to resolve a few bug fixes. Justification & breakdown here: #1215

v0.5.1-rc.1

22 Jul 23:20
v0.5.1-rc.1

Choose a tag to compare

v0.5.1-rc.1 Pre-release
Pre-release

This patch fix is intended to resolve a few bug fixes. Justification & breakdown here: #1215

v0.5.0

21 Jul 18:20
38577e6

Choose a tag to compare

Overview

Major Highlights

  • Conformance Tests: Validate your controller’s behavior with end-to-end tests covering InferencePool,
    InferenceModel, HTTPRoute, and more.

  • New Config API: A new Config API which allows the configuration of plugins through a config file without touching core code.

  • Helm Charts: helm chart update to support the reuse of Config API easily.

What's Changed

Read more

v0.5.0-rc.3

20 Jul 07:08
bbe9dda

Choose a tag to compare

v0.5.0-rc.3 Pre-release
Pre-release

Overview

Major Highlights

  • Conformance Tests: Validate your controller’s behavior with end-to-end tests covering InferencePool,
    InferenceModel, HTTPRoute, and more.

  • New Config API: A new Config API which allows the configuration of plugins through a config file without touching core code.

  • Helm Charts: helm chart update to support the reuse of Config API easily.

What's Changed

Read more

v0.5.0-rc.2

16 Jul 16:56
7fa8fc0

Choose a tag to compare

v0.5.0-rc.2 Pre-release
Pre-release

Overview

Major Highlights

  • Conformance Tests: Validate your controller’s behavior with end-to-end tests covering InferencePool,
    InferenceModel, HTTPRoute, and more.

  • New Config API: A new Config API which allows the configuration of plugins through a config file without touching core code.

  • Helm Charts: helm chart update to support the reuse of Config API easily.

What's Changed

Read more

v0.5.0-rc.1

15 Jul 21:05
73fd266

Choose a tag to compare

v0.5.0-rc.1 Pre-release
Pre-release

Overview

Major Highlights

  • Conformance Tests: Validate your controller’s behavior with end-to-end tests covering InferencePool,
    InferenceModel, HTTPRoute, and more.

  • New Config API: A new Config API which allows the configuration of plugins through a config file without touching core code.

What's Changed

  • Add scripts for running e2es by @keithmattix in #978
  • fix: istio example destination rule by @EyalPazz in #970
  • Bump Istio tag reference by @keithmattix in #974
  • adds New functions to the scorers for consistency by @nirrozenbaum in #975
  • feat(conformance): enable multiple endpoints in header based filter for EPP's conformance testing. by @zetxqx in #964
  • e2e makefile comment fix by @nirrozenbaum in #976
  • API: Adds 5xx Status Code for Invalid ExtRef by @danehans in #991
  • feat(conformance): Add test for invalid EPP service reference by @SinaChavoshi in #959
  • moved the creation of the context to main.go. by @nirrozenbaum in #995
  • doc: fix dead links by @caozhuozi in #989
  • feat: add health check for epp cluster by @zhengkezhou1 in #966
  • test: gRPC server unit tests and utilities for further end-to-end tests by @irar2 in #820
  • Update dynamic-lora-sidecar to expose metrics to track loaded adapters by @shotarok in #980
  • refactor: Replace prefix cache structure with golang-lru by @kfirtoledo in #928
  • feat(conformance): Add HTTPRouteMultipleRulesDifferentPools test by @SinaChavoshi in #834
  • feat: Load the SchedulerConfig from a configuration file/text and make it easier to add plugins by @shmuelk in #881
  • adding pre-request plugin to requestcontrol layer by @nirrozenbaum in #1004
  • feat(conformance): Add test execution instruction to the guide. by @SinaChavoshi in #878
  • fix: Update bbr fqdn to use helm release namespace by @chewong in #1009
  • feat(conformance): Add HTTPRoute port validation tests for InferencePool backends by @zetxqx in #911
  • refactor(conformance): move some common resources to shared place and add EPP service to tests needed. by @zetxqx in #982
  • fix(Conformance): Add namespace-(labels|annotations) flag parsing by @aslakknutsen in #984
  • bump cpu deployment version by @nirrozenbaum in #1016
  • fix: api doc typo InvalidExtnesionRef by @aslakknutsen in #1018
  • Adds vLLM CPU and Sim Support to Release Script by @danehans in #1020
  • Add Makefile to run unit tests of tools/dynamic-lora-sidecar locally by @shotarok in #1021
  • profile handler ProcessResult returns additional return value by @nirrozenbaum in #1013
  • cleanup after config api PR was merged by @nirrozenbaum in #1012
  • Making inferenceModel optional by @kfswain in #1024
  • Adding Design Principles by @robscott in #596
  • Adding Nir as a maintainer! by @kfswain in #1026
  • [Fix] Missing property "apiGroup" error by @yafengio in #1015
  • API: Adds default status condition to InferencePool by @danehans in #830
  • feat(conformance): Add EPP conformance test for Gateway routing by @zetxqx in #961
  • update sim deployment tag to latest by @nirrozenbaum in #1041
  • refactor: rename plugin.Name() => plugin.Type() by @elevran in #1038
  • docs: update the Getting Started guide to use the latest CRDs by @kfirtoledo in #1045
  • added cycle state to pick & process results in profile handler by @nirrozenbaum in #1040
  • feat(conformance): Add HTTPRouteMultipleGatewaysDifferentPools test by @SinaChavoshi in #838
  • feat(conformance) add EPP unavailable fail-open test by @zetxqx in #999
  • Add APIs for the instantiated plugins to the EPP Handle by @shmuelk in #1039
  • chore(deps): bump the kubernetes group with 6 updates by @dependabot[bot] in #1050
  • chore(deps): bump github.com/prometheus/common from 0.64.0 to 0.65.0 by @dependabot[bot] in #1051
  • Only create LOCALBIN directory when it does not exist by @elevran in #1054
  • remove datastore dependency from the scheduler by @nirrozenbaum in #1049
  • add e2e test for epp metrics by @delavet in #938
  • refactor(confromance) use common resources for InferencePoolHTTPRoutePortValidation test by @zetxqx in #1034
  • Reintroduce Plugin.Name() by @elevran in #1057
  • Extensible/Pluggable data layer proposal by @elevran in #1023
  • Add subsetting logic for epp by @rlakhtakia in #981
  • docs: added gke clean up instructions by @capri-xiyue in #1064
  • feat(flowcontrol): Add Foundational Types and Architecture by @LukeAVanDrie in #997
  • refactor: Allow export prefix SchedulingContextState for use across plugins by @kfirtoledo in #1063
  • feat: Added a factory function for the DecisionTree filter by @shmuelk in #1053
  • Adding pprof endpoints to metrics port by @kfswain in #1069
  • version in README by @nirrozenbaum in #1072
  • feat: Add a context.Context to the plugins.HAndle interface by @shmuelk in #1076
  • Update model server protocol with prefix cache reuse by @liu-cong in #1077
  • Update prefix plugin guide to use vllm as default to be consistent by @liu-cong in #1078
  • refactor(conformance) merge similar utility functions. by @zetxqx in #1055
  • fix(conformance): fix conformance setup issue by not relying on suite.Setup from gateway-api by @zetxqx in #1060
  • e2e cleanup by @nirrozenbaum in #988
  • fix: add wait after both httproute deletes for status to update by @aslakknutsen in #1056
  • API: Refine ResolvedRefs condition for invalid ExtensionReference and expand InferencePoolReason values by @zetxqx in #1070
  • Tidy up Data Layer documentation by @elevran in https:...
Read more

v0.4.0

23 Jun 04:32
v0.4.0
2b5b337

Choose a tag to compare

Overview

We are thrilled to announce the v0.4.0 releaseβ€”our biggest update yet! This version brings powerful new Endpoint Picker (EPP) scheduler capabilities, performance improvements, and initial Gateway conformance tests.

Major Highlights

  • Modular Endpoint Picker (EPP) Scheduler: A kube-scheduler–style plugin API lets you build custom routing logic,
    filter and score backends, or swap in new picker strategies without touching core code.

  • Prefix-Cache-Aware Routing: Dramatically lower tail latency by routing requests based on cached network prefixes,
    improving response times under load.

  • Richer Metrics: Gain deeper insights with new metrics including:

    • NTPOT (Normalized Time Per Output Token)
    • Scheduler latency
    • Per-pod queue depth
    • Build and version info
  • Optional vLLM Simulator Backend: Spin up a lightweight simulator for local development and testingβ€”no real model
    servers required.

  • Initial Conformance Tests: Validate your controller’s behavior with end-to-end tests covering InferencePool,
    InferenceModel, HTTPRoute, and more.

What's Changed

Read more

v0.4.0-rc.1

13 Jun 15:28
v0.4.0-rc.1

Choose a tag to compare

v0.4.0-rc.1 Pre-release
Pre-release

TL;DR

  • We have made major refactor to the EPP, allowing for a more modular and maintainable system.
    • As a part of this overall, we have implemented a pluggable, extendable scheduler system. Allowing users to create their own custom, sophisticated routing logic
  • We have also included native support for Prefix Cache Aware Routing

What's Changed

Read more