Skip to content

Support Shared Cache #35876

@LoserCheems

Description

@LoserCheems

Feature request

A new cache class that supports sharing the same or part of the KV cache between different layers to improve cache efficiency.

Motivation

Many studies have shown that attention weights between different attention layers are always similar, and KV cache sharing only causes a small quality degradation, while improving 2~3 times token/sec.

Your contribution

I would try to submit a PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions