Support Shared Cache

### Feature request

A new cache class that supports sharing the same or part of the KV cache between different layers to improve cache efficiency.

### Motivation

Many studies have shown that attention weights between different attention layers are always similar, and `KV cache sharing` only causes a small quality degradation, while improving **2~3 times token/sec**.

### Your contribution

I would try to submit a PR.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support Shared Cache #35876

Feature request

Motivation

Your contribution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support Shared Cache #35876

Description

Feature request

Motivation

Your contribution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions