Skip to content

Add flow controller. #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 53 commits into
base: main
Choose a base branch
from
Draft

Add flow controller. #2

wants to merge 53 commits into from

Conversation

LukeAVanDrie
Copy link
Owner

No description provided.

@LukeAVanDrie LukeAVanDrie reopened this May 8, 2025
@LukeAVanDrie LukeAVanDrie changed the title Add scheduler for queuing and fairness. Add flow controller. May 8, 2025
nirrozenbaum and others added 25 commits May 8, 2025 08:55
the file contains only two consts that are not used anywhere (same consts are defined in runserver.go

Signed-off-by: Nir Rozenbaum <[email protected]>
Refactored the environment variable utility (pkg/epp/util/env)
to enhance code quality, readability, and maintainability.

Key changes:
- Introduced generic helper functions `parseEnvWithValue` and
  `getEnvWithParser` to centralize common logic for fetching
  and parsing environment variables, significantly reducing
  code duplication.
- Standardized logging messages for consistency across all
  `GetEnv<Type>` functions.
- Added `GetEnvDuration`.
* refactor schdeuler filters package to simplify and improve readability and maintainability

Signed-off-by: Nir Rozenbaum <[email protected]>

* filter refactor finalizing

Signed-off-by: Nir Rozenbaum <[email protected]>

---------

Signed-off-by: Nir Rozenbaum <[email protected]>
…ernetes-sigs#810)

current implementation leaves dangling go routines and structs which will consume resources and hold unused objects from being GCd

Signed-off-by: Nir Rozenbaum <[email protected]>
* merge has capacity filter with sheddable filter.

has capacity only use was for sheddable requests (passthrough for critical ones).

Signed-off-by: Nir Rozenbaum <[email protected]>

* Update pkg/epp/scheduling/plugins/filter/filter_test.go

Co-authored-by: Cong Liu <[email protected]>

---------

Signed-off-by: Nir Rozenbaum <[email protected]>
Co-authored-by: Cong Liu <[email protected]>
… setup (kubernetes-sigs#772)

* Add inferencepool_lifecycle test.

* Resolve setup issues and enable InferencePool test

* correct Lint error Multiplication of durations

* Fix missing containerPort, is missing

* change gateway name from "gateway-conformance-app" to "conformance-gateway"

* clarify why K8s types are needed.

* Update conformance/conformance.go

Co-authored-by: Lior Lieberman <[email protected]>

* Update conformance/conformance.go

Co-authored-by: Lior Lieberman <[email protected]>

* remove for loop when adding SupportedFeatures

* remove exessive logging

* Update conformance/conformance.go

Co-authored-by: Lior Lieberman <[email protected]>

* move excess debug logs behind debug flag.

* remove CONFORMANCE.GO prefix from logs.

* change the pull logic and use default value from GatewayMustHaveAddress

* fix mt.Sprintf can be replaced with string concatenation

* add a function for logDebug

* factor out ensureGatewayAvailableAndReady

* removed todo comment in helper.go

* remove CONFORMANCE.GO from log

* error messages, should not be capitalized or end with punctuation

---------

Co-authored-by: Lior Lieberman <[email protected]>
* Add prefix cache aware scheduling

* Replace scheduler v2 with config v2

* Add score weight to XXScorerConfig

* Address comments

* Clean up

* Change to use container/list lib

* cleanup

* Add TODO

* make linter happy
* generalize scheduling cycle state concept

Signed-off-by: Nir Rozenbaum <[email protected]>

* typo

Signed-off-by: Nir Rozenbaum <[email protected]>

* make linter happy

Signed-off-by: Nir Rozenbaum <[email protected]>

* make prefix state struct internal to package instead of public

Signed-off-by: Nir Rozenbaum <[email protected]>

---------

Signed-off-by: Nir Rozenbaum <[email protected]>
* remove Model field from LLMRequest

Signed-off-by: Nir Rozenbaum <[email protected]>

* rebase handling

Signed-off-by: Nir Rozenbaum <[email protected]>

---------

Signed-off-by: Nir Rozenbaum <[email protected]>
* Added the LLMResponse struct and RequestId to LLMRequest

Signed-off-by: Shmuel Kallner <[email protected]>

* Updates due to NewSchedulerContext API change

Signed-off-by: Shmuel Kallner <[email protected]>

* Populate the RequestId field of LLMRequest

Signed-off-by: Shmuel Kallner <[email protected]>

* Updates to tests

Signed-off-by: Shmuel Kallner <[email protected]>

* Added PostResponse plugins to scheduler config

Signed-off-by: Shmuel Kallner <[email protected]>

* Added scheduler.OnResponse to handle responses

Signed-off-by: Shmuel Kallner <[email protected]>

* Added dispatcher.HandleResponse to handle responses

Signed-off-by: Shmuel Kallner <[email protected]>

* Refactored server response header handling to invoke PostResponse plugins

Signed-off-by: Shmuel Kallner <[email protected]>

* Added simple test for PostResponse plugins

Signed-off-by: Shmuel Kallner <[email protected]>

* Setup the logger in the SchedulerContext appropriately for reponses

Signed-off-by: Shmuel Kallner <[email protected]>

* Updates due to rebase issues

* merge functions in env utils (kubernetes-sigs#819)

Signed-off-by: Nir Rozenbaum <[email protected]>

* generalize scheduling cycle state concept (kubernetes-sigs#818)

* generalize scheduling cycle state concept

Signed-off-by: Nir Rozenbaum <[email protected]>

* typo

Signed-off-by: Nir Rozenbaum <[email protected]>

* make linter happy

Signed-off-by: Nir Rozenbaum <[email protected]>

* make prefix state struct internal to package instead of public

Signed-off-by: Nir Rozenbaum <[email protected]>

---------

Signed-off-by: Nir Rozenbaum <[email protected]>

* remove Model field from LLMRequest (kubernetes-sigs#782)

* remove Model field from LLMRequest

Signed-off-by: Nir Rozenbaum <[email protected]>

* rebase handling

Signed-off-by: Nir Rozenbaum <[email protected]>

---------

Signed-off-by: Nir Rozenbaum <[email protected]>

* Added the LLMResponse struct and RequestId to LLMRequest

Signed-off-by: Shmuel Kallner <[email protected]>

* Insure that wanted response header messages have all of the response headers in them

Signed-off-by: Shmuel Kallner <[email protected]>

---------

Signed-off-by: Shmuel Kallner <[email protected]>
Signed-off-by: Nir Rozenbaum <[email protected]>
Co-authored-by: Nir Rozenbaum <[email protected]>
* Add prefex aware routing proposal

* Update, add a diagram

* Add future work

* Update to PR number, clarify terminologies
The reference will be from `_PULL_BASE_REF` variable from the cloud
build: https://docs.prow.k8s.io/docs/jobs/

The change also fixes the commit label by using the right variable added
in https://github.com/kubernetes/test-infra/pull/34755/files.
This commit adds a new `SaturationDetector` component responsible for
determining if backend model servers are saturated. It bases its
decision on observed metrics like queue depth and KV cache utilization,
using configurable thresholds.

The detector is designed to be a self-contained unit that can be
leveraged by other components for admission control and capacity
assessment. This is the first step in a larger refactoring to
externalize and centralize saturation detection logic.
)

* support extracting prompt from chat completions API

Signed-off-by: Hang Yin <[email protected]>

* typo fixes

Signed-off-by: Hang Yin <[email protected]>

* fix tests

* supply more tests and heading boilerplate

Signed-off-by: Hang Yin <[email protected]>

---------

Signed-off-by: Hang Yin <[email protected]>
The TestMetricsRefresh test in pod_metrics_test.go was flaky due to a
race condition. The `StopRefreshLoop` method would signal the metrics
refresh goroutine to stop but did not wait for its actual termination.
If the test updated the mock metrics client immediately after calling
`StopRefreshLoop`, the refresh goroutine could, in rare cases, perform
a final metrics fetch with the new data before fully exiting. This
resulted in the test asserting against unexpected metric values.

This commit resolves the issue by making adding a sleep for the metrics
refresh interval in TestMetricsRefresh. Additionally, it adds the
following for robustness in `StopRefreshLoop`.
- `stopOnce` is used to ensure the `done` channel is only closed once
  (for idempotency and protection against concurrent calls).

This change ensures that the refresh goroutine is guaranteed to have
stopped before any test assertions are made, eliminating the race
condition.
* Add inferencepool_lifecycle test.

* Resolve setup issues and enable InferencePool test

* removed todo comment in helper.go

* Add InferencePoolLifecycle test

* update comments in helper.go

* remove Conformanc.go from log message

* Remove lifecycle test.

* Removed unused helper methods ( inference pool must have selector & must be deleted)

* Set timeout values as constant

* change timeout.go to timing.go
* Scheduler subsystem high level design proposal

This sets down basic design principles of the current gateway
scheduler. We also highlight who we are targeting as users, and
why we prioritize the current approach. It also selects standard
terminology for scheduling that the implementation should adopt.

This is a high level design and thus sets general scope, without
expecting to fully address all problems.

* Review feedback

---------

Co-authored-by: Kellen Swain <[email protected]>
kfswain and others added 3 commits May 20, 2025 19:42
…netes-sigs#835)

* small refactor of scheduler config
handles how to register a plugin that implements multiple scheduler plugins interfaces with a single registration command

Signed-off-by: Nir Rozenbaum <[email protected]>

* code review

Signed-off-by: Nir Rozenbaum <[email protected]>

* minor change

Signed-off-by: Nir Rozenbaum <[email protected]>

---------

Signed-off-by: Nir Rozenbaum <[email protected]>
nayihz and others added 2 commits May 22, 2025 09:12
* feat: migrate epp metric server

Signed-off-by: nayihz <[email protected]>

* feat: migrate bbr metric server

Signed-off-by: nayihz <[email protected]>

* fix: metric reset not effect

Signed-off-by: nayihz <[email protected]>

* fix: add the stability level to the help message of the metric

* fix: refactor custom inferencepool metric

Signed-off-by: nayihz <[email protected]>

---------

Signed-off-by: nayihz <[email protected]>
…narios by using gateway api inference extension (kubernetes-sigs#812)

* added common cases

* added more details

Signed-off-by: Xiyue Yu <[email protected]>

* fixed comments

* changed file location

* fixed typo

* Update site-src/guides/serve-multiple-lora-adapters.md

Co-authored-by: Cong Liu <[email protected]>

* Update site-src/guides/serve-multiple-lora-adapters.md

Co-authored-by: Cong Liu <[email protected]>

* Update mkdocs.yml

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/guides/serve-multiple-lora-adapters.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/guides/serve-multiple-genai-models.md

Co-authored-by: Rob Scott <[email protected]>

* added subsession

* fixed wording

---------

Signed-off-by: Xiyue Yu <[email protected]>
Co-authored-by: Cong Liu <[email protected]>
Co-authored-by: Rob Scott <[email protected]>
terrytangyuan and others added 2 commits May 22, 2025 20:22
* code review

Signed-off-by: Nir Rozenbaum <[email protected]>

* minor change

Signed-off-by: Nir Rozenbaum <[email protected]>

* add support for multi cycle scheduling

Signed-off-by: Nir Rozenbaum <[email protected]>

* minor change

Signed-off-by: Nir Rozenbaum <[email protected]>

* moved plugins under plugins dir

Signed-off-by: Nir Rozenbaum <[email protected]>

* few more changes

Signed-off-by: Nir Rozenbaum <[email protected]>

* moved RunCycle logic into SchedulerProfile

Signed-off-by: Nir Rozenbaum <[email protected]>

* minor changes

Signed-off-by: Nir Rozenbaum <[email protected]>

* linter

Signed-off-by: Nir Rozenbaum <[email protected]>

* minor change in unit-test

Signed-off-by: Nir Rozenbaum <[email protected]>

---------

Signed-off-by: Nir Rozenbaum <[email protected]>
@LukeAVanDrie LukeAVanDrie force-pushed the scheduler branch 3 times, most recently from 0b69491 to 16ed2e2 Compare May 27, 2025 06:00
SinaChavoshi and others added 5 commits May 27, 2025 09:30
…ernetes-sigs#807)

* Add inferencepool_lifecycle test.

* Resolve setup issues and enable InferencePool test

* correct Lint error Multiplication of durations

* Fix missing containerPort, is missing

* change gateway name from "gateway-conformance-app" to "conformance-gateway"

* clarify why K8s types are needed.

* Update conformance/conformance.go

Co-authored-by: Lior Lieberman <[email protected]>

* Update conformance/conformance.go

Co-authored-by: Lior Lieberman <[email protected]>

* remove for loop when adding SupportedFeatures

* remove exessive logging

* Update conformance/conformance.go

Co-authored-by: Lior Lieberman <[email protected]>

* move excess debug logs behind debug flag.

* remove CONFORMANCE.GO prefix from logs.

* change the pull logic and use default value from GatewayMustHaveAddress

* fix mt.Sprintf can be replaced with string concatenation

* add a function for logDebug

* factor out ensureGatewayAvailableAndReady

* removed todo comment in helper.go

* remove CONFORMANCE.GO from log

* Add InferencePoolLifecycle test

* update comments in helper.go

* Initial commit for InferencePoolNoMatchingPodsRouteStatus test

* resolve lint issue.

* error messages, should not be capitalized or end with punctuation

* Add inferencepool_lifecycle test.

* Resolve setup issues and enable InferencePool test

* removed todo comment in helper.go

* Add InferencePoolLifecycle test

* update comments in helper.go

* remove Conformanc.go from log message

* Remove lifecycle test.

* Removed unused helper methods ( inference pool must have selector & must be deleted)

* add back HTTPRouteMustHaveParentStatusConditions

* Set timeout values as constant

* change timeout.go to timing.go

* remove duplicate log

* remove excess comments and logs

* add comment / todo for Reconciled

* Update conformance/utils/kubernetes/helpers.go

Co-authored-by: Rob Scott <[email protected]>

* change test to HTTPRouteInvalidInferencePoolRef

* use TODO: instead of TODO()

* yaml and todos based on code review

---------

Co-authored-by: Lior Lieberman <[email protected]>
Co-authored-by: Rob Scott <[email protected]>
…bernetes-sigs#832)

* WIP tests for inferencepool_resolvedrefs_condition

* update condition check

* Add helper method for inf pool parrent status check

* update manifests

* update the test to match manifest

* fix yaml files.

* add SupportInferencePool

* Add a helper function for HTTPRouteMustBeAcceptedAndResolved

* Add a helper method InferencePoolMustBeAcceptedByParent

* add todo for ensure http requests are routed correctly kubernetes-sigs#865

* remove extra space
…d InferenceModel (kubernetes-sigs#870)

* Update docs about InferencePool

* Update docs about InferenceModel
* remove the PreCycle plugin from scheduler

Signed-off-by: Nir Rozenbaum <[email protected]>

* Apply suggestions from code review

Co-authored-by: Cong Liu <[email protected]>

---------

Signed-off-by: Nir Rozenbaum <[email protected]>
Co-authored-by: Cong Liu <[email protected]>
@LukeAVanDrie LukeAVanDrie force-pushed the scheduler branch 3 times, most recently from 6bb4506 to c998236 Compare May 28, 2025 20:23
… E2E request validation (kubernetes-sigs#866)

* WIP tests for inferencepool_resolvedrefs_condition

* update condition check

* Add helper method for inf pool parrent status check

* update manifests

* update the test to match manifest

* fix yaml files.

* add SupportInferencePool

* Add a helper function for HTTPRouteMustBeAcceptedAndResolved

* Add a helper method InferencePoolMustBeAcceptedByParent

* add todo for ensure http requests are routed correctly kubernetes-sigs#865

* Add http tests

* update to use echo server instead

* fix echo server port.

* Add env var to include namespace and pod name for echo server resposne.

* factor out the common HTTPResponse builder

* shorten wait time

* remove extra space

* fix yaml formatting

* clean up yaml file remove white space and optional fields.

* change naming convention to primary secondary consistently.

* add helper method for "MakeRequestAndExpectNotFound/Success

* use config instead of inferenceconfig
@LukeAVanDrie LukeAVanDrie force-pushed the scheduler branch 2 times, most recently from b7f210d to 55031a4 Compare May 29, 2025 00:57
nirrozenbaum and others added 2 commits May 28, 2025 17:58
* small changes to saturation detector

Signed-off-by: Nir Rozenbaum <[email protected]>

* var rename

Signed-off-by: Nir Rozenbaum <[email protected]>

---------

Signed-off-by: Nir Rozenbaum <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.