Skip to content

Prefix cache plugin should also add response to the cache #971

@liu-cong

Description

@liu-cong

What would you like to be added:

To do this, we simply add the PostResponse extension point to the prefix plugin, and update the cache with the response text.

Why is this needed:

The generated tokens are also cached by the model servers (vLLM at least). Upon receiving the response, the prefix plugin should also add the response to the prefix indexer. This makes the prefix indexer more accurate.

Metadata

Metadata

Labels

triage/acceptedIndicates an issue or PR is ready to be actively worked on.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions