Create a `SpeculativeCacheManager`

**Is your feature request related to a problem? Please describe.**
I find hard to book-keep all the KV cache updates and backtracks between the `draft_subnet` and `verify_subnet`. Need a consistent way to perform these actions; especially when we have batches where an uneven number of draft tokens are accepted for each prompt.

**Describe the solution you'd like**
A `SpeculativeCacheManager` that will abstract away all of the KV Cache updates and backtracks.

**Describe alternatives you've considered**
Making classmethods in the `CleavedAutoModelForCausalLM` class; this seemed to violate the single responsibility principle and made the class very verbose.

**Additional context**
None atm.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Create a `SpeculativeCacheManager` #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Create a SpeculativeCacheManager #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Create a `SpeculativeCacheManager` #6