forked from kubernetes-sigs/gateway-api-inference-extension
-
Notifications
You must be signed in to change notification settings - Fork 0
Add flow controller. #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
LukeAVanDrie
wants to merge
53
commits into
main
Choose a base branch
from
scheduler
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ahg-g
reviewed
Apr 9, 2025
the file contains only two consts that are not used anywhere (same consts are defined in runserver.go Signed-off-by: Nir Rozenbaum <[email protected]>
Refactored the environment variable utility (pkg/epp/util/env) to enhance code quality, readability, and maintainability. Key changes: - Introduced generic helper functions `parseEnvWithValue` and `getEnvWithParser` to centralize common logic for fetching and parsing environment variables, significantly reducing code duplication. - Standardized logging messages for consistency across all `GetEnv<Type>` functions. - Added `GetEnvDuration`.
* refactor schdeuler filters package to simplify and improve readability and maintainability Signed-off-by: Nir Rozenbaum <[email protected]> * filter refactor finalizing Signed-off-by: Nir Rozenbaum <[email protected]> --------- Signed-off-by: Nir Rozenbaum <[email protected]>
Signed-off-by: Nir Rozenbaum <[email protected]>
…ernetes-sigs#810) current implementation leaves dangling go routines and structs which will consume resources and hold unused objects from being GCd Signed-off-by: Nir Rozenbaum <[email protected]>
* merge has capacity filter with sheddable filter. has capacity only use was for sheddable requests (passthrough for critical ones). Signed-off-by: Nir Rozenbaum <[email protected]> * Update pkg/epp/scheduling/plugins/filter/filter_test.go Co-authored-by: Cong Liu <[email protected]> --------- Signed-off-by: Nir Rozenbaum <[email protected]> Co-authored-by: Cong Liu <[email protected]>
… setup (kubernetes-sigs#772) * Add inferencepool_lifecycle test. * Resolve setup issues and enable InferencePool test * correct Lint error Multiplication of durations * Fix missing containerPort, is missing * change gateway name from "gateway-conformance-app" to "conformance-gateway" * clarify why K8s types are needed. * Update conformance/conformance.go Co-authored-by: Lior Lieberman <[email protected]> * Update conformance/conformance.go Co-authored-by: Lior Lieberman <[email protected]> * remove for loop when adding SupportedFeatures * remove exessive logging * Update conformance/conformance.go Co-authored-by: Lior Lieberman <[email protected]> * move excess debug logs behind debug flag. * remove CONFORMANCE.GO prefix from logs. * change the pull logic and use default value from GatewayMustHaveAddress * fix mt.Sprintf can be replaced with string concatenation * add a function for logDebug * factor out ensureGatewayAvailableAndReady * removed todo comment in helper.go * remove CONFORMANCE.GO from log * error messages, should not be capitalized or end with punctuation --------- Co-authored-by: Lior Lieberman <[email protected]>
* Add prefix cache aware scheduling * Replace scheduler v2 with config v2 * Add score weight to XXScorerConfig * Address comments * Clean up * Change to use container/list lib * cleanup * Add TODO * make linter happy
Signed-off-by: Nir Rozenbaum <[email protected]>
* generalize scheduling cycle state concept Signed-off-by: Nir Rozenbaum <[email protected]> * typo Signed-off-by: Nir Rozenbaum <[email protected]> * make linter happy Signed-off-by: Nir Rozenbaum <[email protected]> * make prefix state struct internal to package instead of public Signed-off-by: Nir Rozenbaum <[email protected]> --------- Signed-off-by: Nir Rozenbaum <[email protected]>
* remove Model field from LLMRequest Signed-off-by: Nir Rozenbaum <[email protected]> * rebase handling Signed-off-by: Nir Rozenbaum <[email protected]> --------- Signed-off-by: Nir Rozenbaum <[email protected]>
* Added the LLMResponse struct and RequestId to LLMRequest Signed-off-by: Shmuel Kallner <[email protected]> * Updates due to NewSchedulerContext API change Signed-off-by: Shmuel Kallner <[email protected]> * Populate the RequestId field of LLMRequest Signed-off-by: Shmuel Kallner <[email protected]> * Updates to tests Signed-off-by: Shmuel Kallner <[email protected]> * Added PostResponse plugins to scheduler config Signed-off-by: Shmuel Kallner <[email protected]> * Added scheduler.OnResponse to handle responses Signed-off-by: Shmuel Kallner <[email protected]> * Added dispatcher.HandleResponse to handle responses Signed-off-by: Shmuel Kallner <[email protected]> * Refactored server response header handling to invoke PostResponse plugins Signed-off-by: Shmuel Kallner <[email protected]> * Added simple test for PostResponse plugins Signed-off-by: Shmuel Kallner <[email protected]> * Setup the logger in the SchedulerContext appropriately for reponses Signed-off-by: Shmuel Kallner <[email protected]> * Updates due to rebase issues * merge functions in env utils (kubernetes-sigs#819) Signed-off-by: Nir Rozenbaum <[email protected]> * generalize scheduling cycle state concept (kubernetes-sigs#818) * generalize scheduling cycle state concept Signed-off-by: Nir Rozenbaum <[email protected]> * typo Signed-off-by: Nir Rozenbaum <[email protected]> * make linter happy Signed-off-by: Nir Rozenbaum <[email protected]> * make prefix state struct internal to package instead of public Signed-off-by: Nir Rozenbaum <[email protected]> --------- Signed-off-by: Nir Rozenbaum <[email protected]> * remove Model field from LLMRequest (kubernetes-sigs#782) * remove Model field from LLMRequest Signed-off-by: Nir Rozenbaum <[email protected]> * rebase handling Signed-off-by: Nir Rozenbaum <[email protected]> --------- Signed-off-by: Nir Rozenbaum <[email protected]> * Added the LLMResponse struct and RequestId to LLMRequest Signed-off-by: Shmuel Kallner <[email protected]> * Insure that wanted response header messages have all of the response headers in them Signed-off-by: Shmuel Kallner <[email protected]> --------- Signed-off-by: Shmuel Kallner <[email protected]> Signed-off-by: Nir Rozenbaum <[email protected]> Co-authored-by: Nir Rozenbaum <[email protected]>
* Add prefex aware routing proposal * Update, add a diagram * Add future work * Update to PR number, clarify terminologies
Signed-off-by: Daneyon Hansen <[email protected]>
…es-sigs#822) Signed-off-by: Nir Rozenbaum <[email protected]>
The reference will be from `_PULL_BASE_REF` variable from the cloud build: https://docs.prow.k8s.io/docs/jobs/ The change also fixes the commit label by using the right variable added in https://github.com/kubernetes/test-infra/pull/34755/files.
This commit adds a new `SaturationDetector` component responsible for determining if backend model servers are saturated. It bases its decision on observed metrics like queue depth and KV cache utilization, using configurable thresholds. The detector is designed to be a self-contained unit that can be leveraged by other components for admission control and capacity assessment. This is the first step in a larger refactoring to externalize and centralize saturation detection logic.
) * support extracting prompt from chat completions API Signed-off-by: Hang Yin <[email protected]> * typo fixes Signed-off-by: Hang Yin <[email protected]> * fix tests * supply more tests and heading boilerplate Signed-off-by: Hang Yin <[email protected]> --------- Signed-off-by: Hang Yin <[email protected]>
The TestMetricsRefresh test in pod_metrics_test.go was flaky due to a race condition. The `StopRefreshLoop` method would signal the metrics refresh goroutine to stop but did not wait for its actual termination. If the test updated the mock metrics client immediately after calling `StopRefreshLoop`, the refresh goroutine could, in rare cases, perform a final metrics fetch with the new data before fully exiting. This resulted in the test asserting against unexpected metric values. This commit resolves the issue by making adding a sleep for the metrics refresh interval in TestMetricsRefresh. Additionally, it adds the following for robustness in `StopRefreshLoop`. - `stopOnce` is used to ensure the `done` channel is only closed once (for idempotency and protection against concurrent calls). This change ensures that the refresh goroutine is guaranteed to have stopped before any test assertions are made, eliminating the race condition.
* Add inferencepool_lifecycle test. * Resolve setup issues and enable InferencePool test * removed todo comment in helper.go * Add InferencePoolLifecycle test * update comments in helper.go * remove Conformanc.go from log message * Remove lifecycle test. * Removed unused helper methods ( inference pool must have selector & must be deleted) * Set timeout values as constant * change timeout.go to timing.go
* Scheduler subsystem high level design proposal This sets down basic design principles of the current gateway scheduler. We also highlight who we are targeting as users, and why we prioritize the current approach. It also selects standard terminology for scheduling that the implementation should adopt. This is a high level design and thus sets general scope, without expecting to fully address all problems. * Review feedback --------- Co-authored-by: Kellen Swain <[email protected]>
Fix TZ link
…netes-sigs#835) * small refactor of scheduler config handles how to register a plugin that implements multiple scheduler plugins interfaces with a single registration command Signed-off-by: Nir Rozenbaum <[email protected]> * code review Signed-off-by: Nir Rozenbaum <[email protected]> * minor change Signed-off-by: Nir Rozenbaum <[email protected]> --------- Signed-off-by: Nir Rozenbaum <[email protected]>
* feat: migrate epp metric server Signed-off-by: nayihz <[email protected]> * feat: migrate bbr metric server Signed-off-by: nayihz <[email protected]> * fix: metric reset not effect Signed-off-by: nayihz <[email protected]> * fix: add the stability level to the help message of the metric * fix: refactor custom inferencepool metric Signed-off-by: nayihz <[email protected]> --------- Signed-off-by: nayihz <[email protected]>
…narios by using gateway api inference extension (kubernetes-sigs#812) * added common cases * added more details Signed-off-by: Xiyue Yu <[email protected]> * fixed comments * changed file location * fixed typo * Update site-src/guides/serve-multiple-lora-adapters.md Co-authored-by: Cong Liu <[email protected]> * Update site-src/guides/serve-multiple-lora-adapters.md Co-authored-by: Cong Liu <[email protected]> * Update mkdocs.yml Co-authored-by: Rob Scott <[email protected]> * Update site-src/guides/serve-multiple-lora-adapters.md Co-authored-by: Rob Scott <[email protected]> * Update site-src/guides/serve-multiple-genai-models.md Co-authored-by: Rob Scott <[email protected]> * added subsession * fixed wording --------- Signed-off-by: Xiyue Yu <[email protected]> Co-authored-by: Cong Liu <[email protected]> Co-authored-by: Rob Scott <[email protected]>
* code review Signed-off-by: Nir Rozenbaum <[email protected]> * minor change Signed-off-by: Nir Rozenbaum <[email protected]> * add support for multi cycle scheduling Signed-off-by: Nir Rozenbaum <[email protected]> * minor change Signed-off-by: Nir Rozenbaum <[email protected]> * moved plugins under plugins dir Signed-off-by: Nir Rozenbaum <[email protected]> * few more changes Signed-off-by: Nir Rozenbaum <[email protected]> * moved RunCycle logic into SchedulerProfile Signed-off-by: Nir Rozenbaum <[email protected]> * minor changes Signed-off-by: Nir Rozenbaum <[email protected]> * linter Signed-off-by: Nir Rozenbaum <[email protected]> * minor change in unit-test Signed-off-by: Nir Rozenbaum <[email protected]> --------- Signed-off-by: Nir Rozenbaum <[email protected]>
0b69491
to
16ed2e2
Compare
…ernetes-sigs#807) * Add inferencepool_lifecycle test. * Resolve setup issues and enable InferencePool test * correct Lint error Multiplication of durations * Fix missing containerPort, is missing * change gateway name from "gateway-conformance-app" to "conformance-gateway" * clarify why K8s types are needed. * Update conformance/conformance.go Co-authored-by: Lior Lieberman <[email protected]> * Update conformance/conformance.go Co-authored-by: Lior Lieberman <[email protected]> * remove for loop when adding SupportedFeatures * remove exessive logging * Update conformance/conformance.go Co-authored-by: Lior Lieberman <[email protected]> * move excess debug logs behind debug flag. * remove CONFORMANCE.GO prefix from logs. * change the pull logic and use default value from GatewayMustHaveAddress * fix mt.Sprintf can be replaced with string concatenation * add a function for logDebug * factor out ensureGatewayAvailableAndReady * removed todo comment in helper.go * remove CONFORMANCE.GO from log * Add InferencePoolLifecycle test * update comments in helper.go * Initial commit for InferencePoolNoMatchingPodsRouteStatus test * resolve lint issue. * error messages, should not be capitalized or end with punctuation * Add inferencepool_lifecycle test. * Resolve setup issues and enable InferencePool test * removed todo comment in helper.go * Add InferencePoolLifecycle test * update comments in helper.go * remove Conformanc.go from log message * Remove lifecycle test. * Removed unused helper methods ( inference pool must have selector & must be deleted) * add back HTTPRouteMustHaveParentStatusConditions * Set timeout values as constant * change timeout.go to timing.go * remove duplicate log * remove excess comments and logs * add comment / todo for Reconciled * Update conformance/utils/kubernetes/helpers.go Co-authored-by: Rob Scott <[email protected]> * change test to HTTPRouteInvalidInferencePoolRef * use TODO: instead of TODO() * yaml and todos based on code review --------- Co-authored-by: Lior Lieberman <[email protected]> Co-authored-by: Rob Scott <[email protected]>
…bernetes-sigs#832) * WIP tests for inferencepool_resolvedrefs_condition * update condition check * Add helper method for inf pool parrent status check * update manifests * update the test to match manifest * fix yaml files. * add SupportInferencePool * Add a helper function for HTTPRouteMustBeAcceptedAndResolved * Add a helper method InferencePoolMustBeAcceptedByParent * add todo for ensure http requests are routed correctly kubernetes-sigs#865 * remove extra space
…d InferenceModel (kubernetes-sigs#870) * Update docs about InferencePool * Update docs about InferenceModel
…etes-sigs#873) Signed-off-by: Nir Rozenbaum <[email protected]>
* remove the PreCycle plugin from scheduler Signed-off-by: Nir Rozenbaum <[email protected]> * Apply suggestions from code review Co-authored-by: Cong Liu <[email protected]> --------- Signed-off-by: Nir Rozenbaum <[email protected]> Co-authored-by: Cong Liu <[email protected]>
6bb4506
to
c998236
Compare
… E2E request validation (kubernetes-sigs#866) * WIP tests for inferencepool_resolvedrefs_condition * update condition check * Add helper method for inf pool parrent status check * update manifests * update the test to match manifest * fix yaml files. * add SupportInferencePool * Add a helper function for HTTPRouteMustBeAcceptedAndResolved * Add a helper method InferencePoolMustBeAcceptedByParent * add todo for ensure http requests are routed correctly kubernetes-sigs#865 * Add http tests * update to use echo server instead * fix echo server port. * Add env var to include namespace and pod name for echo server resposne. * factor out the common HTTPResponse builder * shorten wait time * remove extra space * fix yaml formatting * clean up yaml file remove white space and optional fields. * change naming convention to primary secondary consistently. * add helper method for "MakeRequestAndExpectNotFound/Success * use config instead of inferenceconfig
b7f210d
to
55031a4
Compare
* small changes to saturation detector Signed-off-by: Nir Rozenbaum <[email protected]> * var rename Signed-off-by: Nir Rozenbaum <[email protected]> --------- Signed-off-by: Nir Rozenbaum <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.