improve model for prefix cache score #1770
                
     Merged
            
            
          
      
        
          +647
        
        
          −182
        
        
          
        
      
    
  
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
This pull request introduces several improvements to both the training and prediction servers for latency prediction, focusing on more granular feature engineering and data bucketing, especially around prefix cache score. The changes enhance model training and prediction accuracy by adding interaction features and refining how data is bucketed and processed. The most important updates are grouped below by theme.
Feature Engineering and Data Preparation:
_prepare_features_with_interactionmethod to bothprediction_server.pyandtraining_server.py, which generates new interaction features (such aseffective_input_tokensand a categoricalprefill_score_bucket) for the TTFT model, improving model learning and prediction accuracy. [1] [2]predictandpredict_batch) to use these engineered features for both single and batch predictions, ensuring consistency with the training pipeline. [1] [2] [3]Data Bucketing Enhancements:
_get_prefix_bucketmethod and updating bucket keys for both TTFT and TPOT data. This enables more granular sampling and storage. [1] [2] [3]Model Training Improvements:
_train_model_with_scalingmethod to accept sample weighting and to drop the categoricalprefill_score_bucketfor Bayesian Ridge models, ensuring compatibility and improved training. [1] [2]Configuration and Miscellaneous:
SAMPLE_WEIGHTING_FOR_PREFIX_CACHEto the settings, allowing optional sample weighting based on prefix cache score.training_server.py.These changes collectively improve the accuracy and flexibility of latency prediction by allowing the models to better capture the effects of prefix cache and input size, and by aligning feature engineering in both training and prediction workflows.