Advanced Prefix Cache Controls #287
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
TODO
Summary
Work to allow control of token prefix cache rates with the synthetic data generator. Firstly adds an auto-incrementing single token prefix to ensure we never repeat the same prefix. Secondly adds controls for sharing one or more fixed prefixes between samples.
Details
1. Ensure every prompt is unique
When generating a prompt, the first token is now taken from an iterator over the tokenizer vocab.
2. Add configurable prefixes to simulate system prompts or other common token prefixes
Adds a
prefix_buckets
argument to theSyntheticDatasetConfig
, each bucket consists of a prefix count, token count, and bucket weight. Prefix count sets the number of unique prefixes to generate for a given bucket, token count is the length of each prompt in the bucket, and bucket weight is used to calculate the proportion of requests the bucket applies to relative to the sum of all bucket weights. Here are a few examples:Here we have one bucket of 32 prefixes of length 2048. Since there are 1024 total samples each prefix will apply to 32 samples. If there is only one bucket than weight can be omitted as the bucket applies to 100% of samples.
In this modified version of the first example 16 of the prompts have 2048 tokens while the other 16 have 1024 tokens.
The prefix tokens of a bucket can also be 0 to disable prefixes for those samples. Here is an example where 40% of the samples have a prefix of 2048 tokens while the other 60% have no prefix.
Test Plan
pytest tests/unit/dataset
)Related Issues
Use of AI
## WRITTEN BY AI ##
)