Skip to content

Conversation

@kaushikmitr
Copy link
Contributor

This pull request introduces several improvements to both the training and prediction servers for latency prediction, focusing on more granular feature engineering and data bucketing, especially around prefix cache score. The changes enhance model training and prediction accuracy by adding interaction features and refining how data is bucketed and processed. The most important updates are grouped below by theme.

Feature Engineering and Data Preparation:

  • Added a _prepare_features_with_interaction method to both prediction_server.py and training_server.py, which generates new interaction features (such as effective_input_tokens and a categorical prefill_score_bucket) for the TTFT model, improving model learning and prediction accuracy. [1] [2]
  • Updated the prediction methods (predict and predict_batch) to use these engineered features for both single and batch predictions, ensuring consistency with the training pipeline. [1] [2] [3]

Data Bucketing Enhancements:

  • Expanded data bucketing in the training server to include a third dimension based on prefix cache score, using a new _get_prefix_bucket method and updating bucket keys for both TTFT and TPOT data. This enables more granular sampling and storage. [1] [2] [3]

Model Training Improvements:

  • Modified the _train_model_with_scaling method to accept sample weighting and to drop the categorical prefill_score_bucket for Bayesian Ridge models, ensuring compatibility and improved training. [1] [2]

Configuration and Miscellaneous:

  • Added a new configuration flag SAMPLE_WEIGHTING_FOR_PREFIX_CACHE to the settings, allowing optional sample weighting based on prefix cache score.
  • Minor code cleanup and import fixes in training_server.py.

These changes collectively improve the accuracy and flexibility of latency prediction by allowing the models to better capture the effects of prefix cache and input size, and by aligning feature engineering in both training and prediction workflows.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 25, 2025
@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Oct 25, 2025
@BenjaminBraunDev
Copy link
Contributor

LGTM, I have finished rebasing and moving logic to the plugins, so once this is in I can rebase over it and make the PR for that.

@ahg-g
Copy link
Contributor

ahg-g commented Oct 27, 2025

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 27, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, kaushikmitr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 27, 2025
@k8s-ci-robot k8s-ci-robot merged commit 60726b0 into kubernetes-sigs:slo-prediction-experimental Oct 27, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants