generated from kubernetes/kubernetes-template-project
-
Couldn't load subscription status.
- Fork 184
Open
Labels
triage/acceptedIndicates an issue or PR is ready to be actively worked on.Indicates an issue or PR is ready to be actively worked on.
Description
What would you like to be added:
To do this, we simply add the PostResponse extension point to the prefix plugin, and update the cache with the response text.
Why is this needed:
The generated tokens are also cached by the model servers (vLLM at least). Upon receiving the response, the prefix plugin should also add the response to the prefix indexer. This makes the prefix indexer more accurate.
Metadata
Metadata
Labels
triage/acceptedIndicates an issue or PR is ready to be actively worked on.Indicates an issue or PR is ready to be actively worked on.